Module Serialk_text.Tdec

Text decoder.

A text decoder inputs UTF-8 data and checks its validity. It updates locations according to advances in the input and has a token buffer used for lexing.

Decoder

type t

The type for UTF-8 text decoders.

val create : ?⁠file:Tloc.fpath -> string -> t

create ~file input decodes input using file (defaults to Tloc.no_file) for text location.

Locations

val file : t -> Tloc.fpath

file d is the input file.

val pos : t -> Tloc.pos

pos d is the current decoding byte position.

val line : t -> Tloc.line_pos

line d is the current line position. Lines increment as described here.

val loc : t -> sbyte:Tloc.pos -> ebyte:Tloc.pos -> sline:Tloc.line_pos -> eline:Tloc.line_pos -> Tloc.t

loc d ~sbyte ~ebyte ~sline ~eline is a location with the correponding position ranges and file according to file.

val loc_to_here : t -> sbyte:Tloc.pos -> sline:Tloc.line_pos -> Tloc.t

loc_to_here d ~sbyte ~sline is a location that starts at ~sbyte and ~sline and ends at the current decoding position.

val loc_here : t -> Tloc.t

loc_here d is like loc_to_here with the start position at the current decoding position.

Errors

exception Err of Tloc.t * string

The exception for errors. A location and an error message

val err : Tloc.t -> string -> 'b

err loc msg raises Err (loc, msg) with no trace.

val err_to_here : t -> sbyte:Tloc.pos -> sline:Tloc.line_pos -> ('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a

err_to_here d ~sbyte ~sline fmt ... is err d (loc_to_here d ~sbyte ~sline) fmt ...

val err_here : t -> ('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a

err_here d is err d (loc_here d) fmt ....

Error message helpers

val err_suggest : ?⁠dist:int -> string list -> string -> string list

err_suggest ~dist candidates s are the elements of candidates whose edit distance is the smallest to s and at most at a distance of dist of s (defaults to 2). If multiple results are returned the order of candidates is preserved.

val err_did_you_mean : ?⁠pre:(Stdlib.Format.formatter -> unit -> unit) -> ?⁠post:(Stdlib.Format.formatter -> unit -> unit) -> kind:string -> (Stdlib.Format.formatter -> 'a -> unit) -> Stdlib.Format.formatter -> ('a * 'a list) -> unit

did_you_mean ~pre kind ~post pp_v formats a faulty value v of kind kind and a list of hints that v could have been mistaken for.

pre defaults to unit "Unknown", post to nop they surround the faulty value before the "did you mean" part as follows "%a %s %a%a." pre () kind pp_v v post (). If hints is empty no "did you mean" part is printed.

Decoding

val eoi : t -> bool

eoi d is true iff the decoder is at the end of input.

val byte : t -> int

byte d is the byte at current position or 0xFFFF if eoi d is true.

val accept_uchar : t -> unit

accept_uchar d accepts an UTF-8 encoded character starting at the current position and moves to the byte after it. Raises Err in case of UTF-8 decoding error.

val accept_byte : t -> unit

accept_byte d accepts the byte at the current position and moves to the next byte. Warning. Faster than accept_uchar but the client needs to make sure it's not accepting invalid UTF-8 data, i.e. that byte d is an US-ASCII encoded character (i.e. <= 0x7F).

Token buffer

val tok_reset : t -> unit

tok_reset d resets the token.

val tok_pop : t -> string

tok_pop d returns the token and tok_resets it.

val tok_accept_uchar : t -> unit

tok_accept_uchar d is like accept_uchar but also adds the UTF-8 byte sequence to the token.

val tok_accept_byte : t -> unit

tok_accept_byte d is like accept_byte but also adds the byte to the token. Warning. accept_byte's warning applies.

val tok_add_byte : t -> int -> unit

tok_add_byte d b adds byte b to the token.

val tok_add_bytes : t -> string -> unit

tok_add_byte d s adds bytes s to the token.

val tok_add_char : t -> char -> unit

tok_add_char d c adds character c to the token.

val tok_add_uchar : t -> Stdlib.Uchar.t -> unit

tok_add_uchar t u adds the UTF-8 encoding of character u to the token.