String.Ascii
US-ASCII string support.
The following functions act only on US-ASCII code points, that is on the bytes in range [0x00
;0x7F
]. The functions can be safely used on UTF-8 encoded strings but they will, of course, only deal with US-ASCII related matters.
References.
is_valid s
is true
iff only for all indices i
of s
, s.[i]
is an US-ASCII character, i.e. a byte in the range [0x00
;0x1F
].
The functions can be safely used on UTF-8 encoded strings; they will of course only deal with US-ASCII casings.
uppercase s
is s
with US-ASCII characters 'a'
to 'z'
mapped to 'A'
to 'Z'
.
lowercase s
is s
with US-ASCII characters 'A'
to 'Z'
mapped to 'a'
to 'z'
.
capitalize s
is like Ascii.uppercase
but performs the map only on s.[0]
.
uncapitalize s
is like Ascii.lowercase
but performs the map only on s.[0]
.
to_hex s
is the sequence of bytes of s
as US-ASCII lowercase hexadecimal digits.
of_hex' h
parses a sequence of US-ASCII (lower or upper cased) hexadecimal digits from h
into its corresponding byte sequence. Error n
is returned either with n
an index in the string which is not a hexadecimal digit or the length of h
if it there is a missing digit at the end.
of_hex
is of_hex'
but errors with an english error message.
escape s
escapes bytes of s
to a representation that uses only US-ASCII printable characters. More precisely:
0x20
;0x5B
] and [0x5D
;0x7E
] are left unchanged. These are the printable US-ASCII bytes, except '\\'
(0x5C
).0x00
;0x1F
], 0x5C
and [0x7F
;0xFF
] are escaped by an hexadecimal "\xHH"
escape with H
a capital hexadecimal number. These bytes are the US-ASCII control characters, the non US-ASCII bytes and '\\'
(0x5C
).Use unescape
to unescape. The invariant unescape (escape s) = Ok s
holds.
unescape s
unescapes from s
the escapes performed by escape
. More precisely:
"\xHH"
with H
a lower or upper case hexadecimal number is unescaped to the corresponding byte value.Any other escape following a '\\'
not defined above makes the function return Error i
with i
the index of the error in the string.
ocaml_string_escape s
escapes the bytes of s
to a representation that uses only US-ASCII printable characters and according to OCaml's conventions for string
literals. More precisely:
'\b'
(0x08
) is escaped to "\\b"
(0x5C,0x62
).'\t'
(0x09
) is escaped to "\\t"
(0x5C,0x74
).'\n'
(0x0A
) is escaped to "\\n"
(0x5C,0x6E
).'\r'
(0x0D
) is escaped to "\\r"
(0x5C,0x72
).'\"'
(0x22
) is escaped to "\\\""
(0x5C,0x22
).'\\'
(0x5C
) is escaped to "\\\\"
(0x5C
,0x5C
).0x20
, 0x21
, [0x23
;0x5B
] and [0x5D
;0x7E
] are left unchanged. These are the printable US-ASCII bytes, except '\"'
(0x22
) and '\\'
(0x5C
)."\xHH"
escape with H
an uppercase hexadecimal number. These bytes are the US-ASCII control characters not mentioned above and non US-ASCII bytes.Use ocaml_unescape
to unescape. The invariant ocaml_unescape (ocaml_string_escape s) = Ok s
holds.
ocaml_unescape s
unescapes from s
the escape sequences afforded by OCaml string
and char
literals. More precisely:
"\\b"
(0x5C,0x62
) is unescaped to '\b'
(0x08
)."\\t"
(0x5C,0x74
) is unescaped to '\t'
(0x09
)."\\n"
(0x5C,0x6E
) is unescaped to '\n'
(0x0A
)."\\r"
(0x5C,0x72
) is unescaped to '\r'
(0x0D
)."\\ "
(0x5C,0x20
) is unescaped to ' '
(0x20
)."\\\""
(0x5C,0x22
) is unescaped to '\"'
(0x22
)."\\'"
(0x5C,0x27
) is unescaped to '\''
(0x27
)."\\\\"
(0x5C
,0x5C
) is unescaped to '\\'
(0x5C
)."\xHH"
with H
a lower or upper case hexadecimal number is unescaped to the corresponding byte value."\\DDD"
with D
a decimal number such that DDD
is unescaped to the corresponding byte value."\\oOOO"
with O
an octal number is unescaped to the corresponding byte value.Any other escape following a '\\'
not defined above makes the function return Error i
with i
the location of the error in the string.