Serialkit_text.TdecUTF-8 text decoder.
A decoder inputs valid UTF-8 text, maintains text locations according to advances in the input and has a lexeme buffer for lexing.
val from_string : ?file:Tloc.fpath -> string -> tfrom_string ~file s decodes s using file (defaults to Tloc.no_file) for text location.
val file : t -> Tloc.fpathfile d is the input file.
val line : t -> Tloc.line_posline d is the current decoding line position. Lines increment as described here.
val loc :
t ->
first_byte:Tloc.pos ->
last_byte:Tloc.pos ->
first_line:Tloc.line_pos ->
last_line:Tloc.line_pos ->
Tloc.tloc d ~first_byte ~last_bytex ~first_line ~last_line is Tloc.v using file d for the file.
val loc_to_here :
t ->
first_byte:Tloc.pos ->
first_line:Tloc.line_pos ->
Tloc.tloc_here d is like loc_to_here with the start position at the current decoding position as determined by pos and line.
exception Err of Tloc.t * stringThe exception for errors. A location and an english error message
val err : Tloc.t -> string -> 'berr loc msg raises Err (loc, msg) with no trace.
val err_to_here :
t ->
first_byte:Tloc.pos ->
first_line:Tloc.line_pos ->
('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 ->
'aerr_to_here d ~first_byte ~first_line fmt ... is err d (loc_to_here d ~first_byte ~first_line) fmt ...
val err_here :
t ->
('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 ->
'aerr_here d is err d (loc_here d) fmt ....
err_suggest ~dist candidates s are the elements of candidates whose edit distance is the smallest to s and at most at a distance of dist of s (defaults to 2). If multiple results are returned the order of candidates is preserved.
and_enum ~empty pp_v ppf l formats l according to its length.
0, formats empty (defaults to nop).1, formats the element with pp_v.2, formats "%a and %a" with the list elementsn, formats "%a, ... and %a" with the list elementsor_enum is like and_enum but uses "or" instead of "and".
did_you_mean pp_v formats "Did you mean %a ?" with or_enum if the list is non-empty and nop otherwise.
must_be pp_v formats "Must be %a." with or_enum if the list is non-empty and nop otherwise.
unknown ~kind pp_v formats "Unknown %a %a." kind () pp_v.
unknown ~kind pp_v ~hint (v, hints) formats unknown followed by a space and hint pp_v hints if hints is non-empty.
val eoi : t -> booleoi d is true iff the decoder is at the end of input.
val byte : t -> intbyte d is the byte at current position or 0xFFFF if eoi d is true.
val accept_uchar : t -> unitaccept_uchar d accepts an UTF-8 encoded character starting at the current position and moves to the byte location after it. Raises Err in case of UTF-8 decoding error.
val accept_byte : t -> unitaccept_byte d accepts the byte at the current position and moves to the byte location after it. Warning. Faster than accept_uchar but the client needs to make sure it's not accepting invalid UTF-8 data, i.e. that byte d is an US-ASCII encoded character (i.e. <= 0x7F).
val lex_clear : t -> unitlex_clear d sets the lexeme to the empty string.
val lex_add_byte : t -> int -> unitlex_add_byte d b adds byte b to the lexen.
val lex_add_bytes : t -> string -> unitlex_add_byte d s adds bytes s to the lexen.
val lex_add_char : t -> char -> unitlex_add_char d c adds character c to the lexen.
val lex_add_uchar : t -> Stdlib.Uchar.t -> unitlex_add_uchar t u adds the UTF-8 encoding of character u to the lexen.
val lex_accept_uchar : t -> unitlex_accept_uchar d is like accept_uchar but also adds the UTF-8 byte sequence to the lexeme.
val lex_accept_byte : t -> unitlex_accept_byte d is like accept_byte but also adds the byte to the lexeme. Warning. accept_byte's warning applies.