Skip to content

module character_encodings

Constants related to character encodings.

Helpful links:

Global Variables

  • NEWLINE_BYTE
  • ENCODING
  • ASCII
  • UTF_8
  • UTF_16
  • UTF_32
  • ISO_8859_1
  • WINDOWS_1252
  • BOMS
  • UNPRINTABLE_ASCII
  • UNPRINTABLE_ISO_8859_1
  • UNPRINTABLE_UTF_8
  • UNPRINTABLE_WIN_1252
  • UNPRINTABLE_ISO_8859_7
  • ENCODINGS_TO_ATTEMPT
  • SINGLE_BYTE_ENCODINGS
  • WIDE_UTF_ENCODINGS
  • ENCODINGS

function scrub_c1_control_chars

scrub_c1_control_chars(char_map: dict) → None

Fill in a dict with integer keys/values corresponding to where a given char encoding has no chars because this range is for C1 control chars (AKA the "undefined" part of most character maps).


function encoding_offsets

encoding_offsets(encoding: str) → list

Get possible offsets for a given encoding. If the encoding is not in WIDE_UTF_ENCODINGS, return [0].


function encoding_width

encoding_width(encoding: str) → int

Get the width of a character in bytes for a given encoding, which is the number of possible offsets.


function is_wide_utf

is_wide_utf(encoding: str) → bool

Check if the encoding is a wide UTF encoding (UTF-16 or UTF-32).


This file was automatically generated via lazydocs.