Module B0_tlex.Tdec

Text decoder.

A text decoder inputs UTF-8 data and checks its validity. It updates locations according to advances in the input and has a token buffer used for lexing.


type t

The type for UTF-8 text decoders.

val create : ?⁠file:B0_std.Fpath.t -> string -> t

create ~file input decodes input using file (defaults to Tloc.no_file) for text location.


val file : t -> B0_std.Fpath.t

file d is the input file.

val pos : t -> Tloc.pos

pos d is the current decoding position.

val line : t -> Tloc.pos * Tloc.line

line d is the current line position. Lines increment as described here.

val loc_to_here : t -> byte_s:Tloc.pos -> line_s:(Tloc.pos * Tloc.line) -> Tloc.t

loc_to_here d ~byte_s ~line_s is a location that starts at ~byte_s and ~line_s and ends at the current decoding position.

val loc_here : t -> Tloc.t

loc_here d is like loc_to_here with the start position at the current decoding position.

val loc : t -> byte_s:Tloc.pos -> byte_e:Tloc.pos -> line_s:(Tloc.pos * Tloc.line) -> line_e:(Tloc.pos * Tloc.line) -> Tloc.t

loc d ~byte_s ~byte_e ~line_s ~line_e is a location with the correponding position range.


exception Err of Tloc.t * string

The exception for errors. A location and an error message

val err : Tloc.t -> string -> 'b

err loc msg raises Err (loc, msg) with no trace.

val err_to_here : t -> byte_s:Tloc.pos -> line_s:(Tloc.pos * Tloc.line) -> ('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a

err_to_here d ~byte_s ~line_s fmt ... raises Err with no trace. The location spans from the given start position to the current decoding position and the message is formatted according to fmt.

val err_here : t -> ('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a

err_here d is like err_to_here with the start position at the current decoding position.


val eoi : t -> bool

eoi d is true iff the decoder is at the end of input.

val byte : t -> int

byte d is the byte at current position or 0xFFFF if eoi d is true.

val accept_uchar : t -> unit

accept_uchar d accepts an UTF-8 encoded character starting at the current position and moves to the byte after it. Raises Err in case of UTF-8 decoding error.

val accept_byte : t -> unit

accept_byte d accepts the byte at the current position and moves to the next byte. Warning. Faster than accept_uchar but the client needs to make sure it's not accepting invalid UTF-8 data, i.e. that byte d is an US-ASCII encoded character (i.e. <= 0x7F).

Token buffer

val tok_reset : t -> unit

tok_reset d resets the token.

val tok_pop : t -> string

tok_pop d returns the token and tok_resets it.

val tok_accept_uchar : t -> unit

tok_accept_uchar d is like accept_uchar but also adds the UTF-8 byte sequence to the token.

val tok_accept_byte : t -> unit

tok_accept_byte d is like accept_byte but also adds the byte to the token. Warning. accept_byte's warning applies.

val tok_add_byte : t -> int -> unit

tok_add_byte d b adds byte b to the token.

val tok_add_bytes : t -> string -> unit

tok_add_byte d s adds bytes s to the token.

val tok_add_char : t -> char -> unit

tok_add_char d b adds character b to the token.

val tok_add_uchar : t -> Stdlib.Uchar.t -> unit

tok_add_uchar t u adds the UTF-8 encoding of character u to the token.