Module Uutf.String.UTF_8

module UTF_8: sig .. end
UTF-8 encoded strings.

The UTF_8 module provides a datatype that asserts that a string value is valid UTF-8 and a few functions that operate under this assumption.



UTF-8 encoded strings


type t = private string 
The type for valid UTF-8 strings.
val is_valid : string -> bool
is_valid s is true iff the string s is valid UTF-8.
val of_string : string -> (t, t) Pervasives.result
of_string s is Ok s if s is valid UTF-8 and Error b if s is not. In this case b is the result of a best-effort decode in which any UTF-8 decoding error is replaced by at least one replacement character Uchar.u_rep.
val v : string -> t
v s is like of_string s but raises Invalid_argument if s is not valid UTF-8.
val to_string : t -> string
to_string s is s as a string.

Decode indices

In a string s a decode index i is a valid string index where an Unicode character can be decoded. In a valid UTF-8 encoded string the index 0 is always a valid decode index.

val next_index : t -> int -> int
next s i is the smallest decode index after the string byte position i or the string length if there is no such index.
Raises Invalid_argument if i is not a valid string position.
val prev_index : t -> int -> int
prev_index s i is the smallest decode index before the string byte position i or 0 if there is no such index.
Raises Invalid_argument if i is not a valid string position.
val fold_indices : ('a -> int -> 'a) -> 'a -> t -> 'a
fold_indices f acc s folds over the decode indices of s.

Unicode characters


val get_uchar : t -> int -> Uchar.t
get_uchar t i is the Unicode character at the decode index i.
Raises Invalid_argument if i is not a decode index of s
val fold_uchars : ('a -> int -> Uchar.t -> 'a) -> 'a -> t -> 'a
fold_uchars f acc s folds over the Unicode characters of s. The decode index is passed with the character to f.

Operations


val append : t -> t -> t
append s s' appends s' to s.
val concat : t -> t list -> t
concat sep ss concatenates the list of strings ss separating each consecutive elements in the list with sep.
Raises Invalid_argument if the result is longer than Sys.max_string_length.

Comparisons


val equal : t -> t -> bool
equal s s' is true iff s and s' are bytewise equal.
val compare : t -> t -> int
compare s s' totally order s and s' according to their byte representation.