Module Serialk_tlex.Tdec

Text decoder.

A text decoder inputs UTF-8 data and checks its validity. It updates locations according to advances in the input and has a token buffer used for lexing.

Decoder

type t

The type for UTF-8 text decoders.

val create : ?⁠file:Tloc.fpath -> string -> t

create ~file input decodes input using file (defaults to Tloc.no_file) for text location.

Locations

val file : t -> Tloc.fpath

file d is the input file.

val pos : t -> Tloc.pos

pos d is the current decoding position.

val line : t -> Tloc.pos * Tloc.line

line d is the current line position. Lines increment as described here.

val loc_to_here : t -> byte_s:Tloc.pos -> line_s:(Tloc.pos * Tloc.line) -> Tloc.t

loc_to_here d ~byte_s ~line_s is a location that starts at ~byte_s and ~line_s and ends at the current decoding position.

val loc_here : t -> Tloc.t

loc_here d is like loc_to_here with the start position at the current decoding position.

val loc : t -> byte_s:Tloc.pos -> byte_e:Tloc.pos -> line_s:(Tloc.pos * Tloc.line) -> line_e:(Tloc.pos * Tloc.line) -> Tloc.t

loc d ~byte_s ~byte_e ~line_s ~line_e is a location with the correponding position range.

Errors

exception Err of Tloc.t * string

The exception for errors. A location and an error message

val err : Tloc.t -> string -> 'b

err loc msg raises Err (loc, msg) with no trace.

val err_to_here : t -> byte_s:Tloc.pos -> line_s:(Tloc.pos * Tloc.line) -> ('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a

err_to_here d ~byte_s ~line_s fmt ... raises Err with no trace. The location spans from the given start position to the current decoding position and the message is formatted according to fmt.

val err_here : t -> ('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a

err_here d is like err_to_here with the start position at the current decoding position.

Error message helpers

val err_suggest : ?⁠dist:int -> string list -> string -> string list

err_suggest ~dist candidates s are the elements of candidates whose edit distance is the smallest to s and at most at a distance of dist of s (defaults to 2). If multiple results are returned the order of candidates is preserved.

val err_did_you_mean : ?⁠pre:(Stdlib.Format.formatter -> unit -> unit) -> ?⁠post:(Stdlib.Format.formatter -> unit -> unit) -> kind:string -> (Stdlib.Format.formatter -> 'a -> unit) -> Stdlib.Format.formatter -> ('a * 'a list) -> unit

did_you_mean ~pre kind ~post pp_v formats a faulty value v of kind kind and a list of hints that v could have been mistaken for.

pre defaults to unit "Unknown", post to nop they surround the faulty value before the "did you mean" part as follows "%a %s %a%a." pre () kind pp_v v post (). If hints is empty no "did you mean" part is printed.

Decoding

val eoi : t -> bool

eoi d is true iff the decoder is at the end of input.

val byte : t -> int

byte d is the byte at current position or 0xFFFF if eoi d is true.

val accept_uchar : t -> unit

accept_uchar d accepts an UTF-8 encoded character starting at the current position and moves to the byte after it. Raises Err in case of UTF-8 decoding error.

val accept_byte : t -> unit

accept_byte d accepts the byte at the current position and moves to the next byte. Warning. Faster than accept_uchar but the client needs to make sure it's not accepting invalid UTF-8 data, i.e. that byte d is an US-ASCII encoded character (i.e. <= 0x7F).

Token buffer

val tok_reset : t -> unit

tok_reset d resets the token.

val tok_pop : t -> string

tok_pop d returns the token and tok_resets it.

val tok_accept_uchar : t -> unit

tok_accept_uchar d is like accept_uchar but also adds the UTF-8 byte sequence to the token.

val tok_accept_byte : t -> unit

tok_accept_byte d is like accept_byte but also adds the byte to the token. Warning. accept_byte's warning applies.

val tok_add_byte : t -> int -> unit

tok_add_byte d b adds byte b to the token.

val tok_add_bytes : t -> string -> unit

tok_add_byte d s adds bytes s to the token.

val tok_add_char : t -> char -> unit

tok_add_char d b adds character b to the token.

val tok_add_uchar : t -> Stdlib.Uchar.t -> unit

tok_add_uchar t u adds the UTF-8 encoding of character u to the token.