B0_text.Tdec
Text decoder.
A text decoder inputs UTF-8 data and checks its validity. It updates locations according to advances in the input and has a token buffer used for lexing.
val create : ?file:Tloc.fpath -> string -> t
create ~file input
decodes input
using file
(defaults to Tloc.no_file
) for text location.
val file : t -> Tloc.fpath
file d
is the input file.
val line : t -> Tloc.line_pos
line d
is the current line position. Lines increment as described here.
val loc :
t ->
sbyte:Tloc.pos ->
ebyte:Tloc.pos ->
sline:Tloc.line_pos ->
eline:Tloc.line_pos ->
Tloc.t
loc d ~sbyte ~ebyte ~sline ~eline
is a location with the correponding position ranges and file according to file
.
val loc_to_here : t -> sbyte:Tloc.pos -> sline:Tloc.line_pos -> Tloc.t
loc_to_here d ~sbyte ~sline
is a location that starts at ~sbyte
and ~sline
and ends at the current decoding position.
loc_here d
is like loc_to_here
with the start position at the current decoding position.
exception Err of Tloc.t * string
The exception for errors. A location and an error message
val err : Tloc.t -> string -> 'b
err loc msg
raises Err (loc, msg)
with no trace.
val err_to_here :
t ->
sbyte:Tloc.pos ->
sline:Tloc.line_pos ->
('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 ->
'a
err_to_here d ~sbyte ~sline fmt ...
is err d (loc_to_here d ~sbyte ~sline) fmt ...
val err_here :
t ->
('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 ->
'a
err_here d
is err d (loc_here d) fmt ...
.
err_suggest ~dist candidates s
are the elements of candidates
whose edit distance is the smallest to s
and at most at a distance of dist
of s
(defaults to 2
). If multiple results are returned the order of candidates
is preserved.
and_enum ~empty pp_v ppf l
formats l
according to its length.
0
, formats empty
(defaults to nop
).1
, formats the element with pp_v
.2
, formats "%a and %a"
with the list elementsn
, formats "%a, ... and %a"
with the list elementsor_enum
is like pp_and_enum
but uses "or" instead of "and".
did_you_mean pp_v
formats "Did you mean %a ?"
with pp_or_enum
if the list is non-empty and nop
otherwise.
must_be pp_v
formats "Must be %a."
with pp_or_enum
if the list is non-empty and nop
otherwise.
pp_unknown ~kind pp_v
formats "Unknown %a %a." kind () pp_v
.
pp_unknown' ~kind pp_v ~hint (v, hints)
formats pp_unknown
followed by a space and hint pp_v hints
if hints
is non-empty.
val eoi : t -> bool
eoi d
is true
iff the decoder is at the end of input.
val byte : t -> int
byte d
is the byte at current position or 0xFFFF
if eoi d
is true
.
val accept_uchar : t -> unit
accept_uchar d
accepts an UTF-8 encoded character starting at the current position and moves to the byte after it. Raises Err
in case of UTF-8 decoding error.
val accept_byte : t -> unit
accept_byte d
accepts the byte at the current position and moves to the next byte. Warning. Faster than accept_uchar
but the client needs to make sure it's not accepting invalid UTF-8 data, i.e. that byte d
is an US-ASCII encoded character (i.e. <= 0x7F
).
val tok_reset : t -> unit
tok_reset d
resets the token.
val tok_accept_uchar : t -> unit
tok_accept_uchar d
is like accept_uchar
but also adds the UTF-8 byte sequence to the token.
val tok_accept_byte : t -> unit
tok_accept_byte d
is like accept_byte
but also adds the byte to the token. Warning. accept_byte
's warning applies.
val tok_add_byte : t -> int -> unit
tok_add_byte d b
adds byte b
to the token.
val tok_add_bytes : t -> string -> unit
tok_add_byte d s
adds bytes s
to the token.
val tok_add_char : t -> char -> unit
tok_add_char d c
adds character c
to the token.
val tok_add_uchar : t -> Stdlib.Uchar.t -> unit
tok_add_uchar t u
adds the UTF-8 encoding of character u
to the token.