Module Down_std.Txt

UTF-8 text handling, possibly malformed.

Note. start, after and before arguments can be out of bounds and in particular equal to the string length. Finding forwards returns the string length if it cannot be found, finding backwards returns 0 if it cannot be found.

val find_next : sat:(char -> bool) -> string -> start:int -> int

find_next ~sat s ~start is either the Sys.max_string s or the index of the byte at or after start that satisfies sat.

val find_prev : sat:(char -> bool) -> string -> start:int -> int

find_prev ~sat s ~start is either the 0 or the index of the byte at or before start that satisfies sat.

val keep_next_len : sat:(char -> bool) -> string -> start:int -> int

keep_next_len ~sat s ~start is the number of consecutive next sat satisfying bytes starting at start, included.

val keep_prev_len : sat:(char -> bool) -> string -> start:int -> int

keep_prev_len ~sat s ~start is the number of consecutive previous sat satisfying bytes starting at start, included.

Lines

val lines : string -> string list

lines s splits s into CR, CRLF, LF lines separated lines. This is [""] on the empty string.

val is_eol : char -> bool

is_eol is true iff c is '\r' or '\n'.

val find_next_eol : string -> start:int -> int

find_next_eol s ~start is either Sys.max_string s or the index of the byte at or after start that satisfies is_eol.

val find_prev_eol : string -> start:int -> int

find_prev_eol s ~start is either 0 or the index of the byte at or before start that satisfies is_eol.

val find_prev_sol : string -> start:int -> int

find_prev_sol s ~start is either 0 or the position after the byte at or before start that satisfies is_eol. This can be Sys.max_string s.

UTF-8 encoded Unicode characters

val utf_8_decode_len : char -> int

utf_8_decode_len b is the length of an UTF-8 encoded Unicode character starting with byte b. This is 1 on UTF-8 continuation or malformed bytes.

val is_utf_8_decode : char -> bool

is_utf_8_decode c is true iff c is not an UTF-8 continuation byte. This means c is either an UTF-8 start byte or an UTF-8 malformed byte.

val find_next_utf_8_decode : string -> start:int -> int

find_next_utf_8_sync s ~start is either Sys.max_string s or the index of the byte at or after start that satisfies is_utf_8_decode.

val find_prev_utf_8_decode : string -> start:int -> int

find_prev_utf_8_decode s ~start is either 0 or the index of the byte at or before start that satisfies is_utf_8_decode.

Whitespace

val is_white : char -> bool

is_white c is true iff c is US-ASCII whitespace (0x20, 0x09, 0x0A, 0x0B, 0x0C or 0x0D).

val find_next_white : string -> start:int -> int

find_next_white s ~start is either String.length s or the first byte position at or after start such that is_white is true.

val find_prev_white : string -> start:int -> int

find_prev_white s ~start is either either 0 or the first byte position at or before start such that is_white is true.

Words

val find_next_after_eow : string -> start:int -> int

find_next_after_eow is either String.length s or the byte position of the first is_white after first skipping white and then non-white starting at start.

val find_prev_sow : string -> start:int -> int

find_prev_sow is either 0 or the byte position after skipping backward first white and then non-white.

Grapheme clusters and TTY width

Note. This is a simple notion of grapheme cluster based on Uucp.Break.tty_width_hint.

val find_next_gc : string -> after:int -> int

find_next_gc s ~after is String.length s or the byte position of the grapheme cluster after the one starting at after.

val find_next_gc_and_tty_width : string -> after:int -> int * int

find_next_gc_and_width s ~after is like find_next_gc but also returns in the second component the tty width of the grapheme cluster at after.

val find_prev_gc : string -> before:int -> int

find_prev_gc s ~before is 0 or the the byte position of the grapheme cluster before the one starting at before.

val find_prev_eol_and_tty_width : string -> before:int -> int * int

find_prev_eol_and_tty_width s ~before is either 0 or the index of the byte before before that satisfies is_eol and in the second component, the tty width needed to go from that index to before.

val find_next_tty_width_or_eol : string -> start:int -> w:int -> int

find_next_tty_width_or_eol s ~start ~w is the index of the grapheme cluster after TTY width w at or after start or of the next end of line if that happened before.