Module String.Ascii

US-ASCII string support.

The following functions act only on US-ASCII code points, that is on the bytes in range [0x00;0x7F]. The functions can be safely used on UTF-8 encoded strings but they will, of course, only deal with US-ASCII related matters.

References.

Predicates

val is_valid : string -> bool

is_valid s is true iff only for all indices i of s, s.[i] is an US-ASCII character, i.e. a byte in the range [0x00;0x1F].

Casing transforms

The functions can be safely used on UTF-8 encoded strings; they will of course only deal with US-ASCII casings.

val uppercase : string -> string

uppercase s is s with US-ASCII characters 'a' to 'z' mapped to 'A' to 'Z'.

val lowercase : string -> string

lowercase s is s with US-ASCII characters 'A' to 'Z' mapped to 'a' to 'z'.

val capitalize : string -> string

capitalize s is like Ascii.uppercase but performs the map only on s.[0].

val uncapitalize : string -> string

uncapitalize s is like Ascii.lowercase but performs the map only on s.[0].

Converting to US-ASCII hexadecimal characters

val to_hex : string -> string

to_hex s is the sequence of bytes of s as US-ASCII lowercase hexadecimal digits.

val of_hex' : string -> (string, int) Stdlib.result

of_hex' h parses a sequence of US-ASCII (lower or upper cased) hexadecimal digits from h into its corresponding byte sequence. Error n is returned either with n an index in the string which is not a hexadecimal digit or the length of h if it there is a missing digit at the end.

val of_hex : string -> (string, string) Stdlib.result

of_hex is of_hex' but errors with an english error message.

Converting to printable US-ASCII characters

val escape : string -> string

escape s escapes bytes of s to a representation that uses only US-ASCII printable characters. More precisely:

  • [0x20;0x5B] and [0x5D;0x7E] are left unchanged. These are the printable US-ASCII bytes, except '\\' (0x5C).
  • [0x00;0x1F], 0x5C and [0x7F;0xFF] are escaped by an hexadecimal "\xHH" escape with H a capital hexadecimal number. These bytes are the US-ASCII control characters, the non US-ASCII bytes and '\\' (0x5C).

Use unescape to unescape. The invariant unescape (escape s) = Ok s holds.

val unescape : string -> (string, int) Stdlib.result

unescape s unescapes from s the escapes performed by escape. More precisely:

  • "\xHH" with H a lower or upper case hexadecimal number is unescaped to the corresponding byte value.

Any other escape following a '\\' not defined above makes the function return Error i with i the index of the error in the string.

val ocaml_string_escape : string -> string

ocaml_string_escape s escapes the bytes of s to a representation that uses only US-ASCII printable characters and according to OCaml's conventions for string literals. More precisely:

  • '\b' (0x08) is escaped to "\\b" (0x5C,0x62).
  • '\t' (0x09) is escaped to "\\t" (0x5C,0x74).
  • '\n' (0x0A) is escaped to "\\n" (0x5C,0x6E).
  • '\r' (0x0D) is escaped to "\\r" (0x5C,0x72).
  • '\"' (0x22) is escaped to "\\\"" (0x5C,0x22).
  • '\\' (0x5C) is escaped to "\\\\" (0x5C,0x5C).
  • 0x20, 0x21, [0x23;0x5B] and [0x5D;0x7E] are left unchanged. These are the printable US-ASCII bytes, except '\"' (0x22) and '\\' (0x5C).
  • Remaining bytes are escaped by an hexadecimal "\xHH" escape with H an uppercase hexadecimal number. These bytes are the US-ASCII control characters not mentioned above and non US-ASCII bytes.

Use ocaml_unescape to unescape. The invariant ocaml_unescape (ocaml_string_escape s) = Ok s holds.

val ocaml_unescape : string -> (string, int) Stdlib.result

ocaml_unescape s unescapes from s the escape sequences afforded by OCaml string and char literals. More precisely:

  • "\\b" (0x5C,0x62) is unescaped to '\b' (0x08).
  • "\\t" (0x5C,0x74) is unescaped to '\t' (0x09).
  • "\\n" (0x5C,0x6E) is unescaped to '\n' (0x0A).
  • "\\r" (0x5C,0x72) is unescaped to '\r' (0x0D).
  • "\\ " (0x5C,0x20) is unescaped to ' ' (0x20).
  • "\\\"" (0x5C,0x22) is unescaped to '\"' (0x22).
  • "\\'" (0x5C,0x27) is unescaped to '\'' (0x27).
  • "\\\\" (0x5C,0x5C) is unescaped to '\\' (0x5C).
  • "\xHH" with H a lower or upper case hexadecimal number is unescaped to the corresponding byte value.
  • "\\DDD" with D a decimal number such that DDD is unescaped to the corresponding byte value.
  • "\\oOOO" with O an octal number is unescaped to the corresponding byte value.

Any other escape following a '\\' not defined above makes the function return Error i with i the location of the error in the string.