Down_std.Txt
UTF-8 text handling, possibly malformed.
Note. start
, after
and before
arguments can be out of bounds and in particular equal to the string length. Finding forwards returns the string length if it cannot be found, finding backwards returns 0 if it cannot be found.
find_next ~sat s ~start
is either the Sys.max_string s
or the index of the byte at or after start
that satisfies sat
.
find_prev ~sat s ~start
is either the 0
or the index of the byte at or before start
that satisfies sat
.
keep_next_len ~sat s ~start
is the number of consecutive next sat
satisfying bytes starting at start
, included.
keep_prev_len ~sat s ~start
is the number of consecutive previous sat
satisfying bytes starting at start
, included.
lines s
splits s
into CR, CRLF, LF lines separated lines. This is [""]
on the empty string.
find_next_eol s ~start
is either Sys.max_string s
or the index of the byte at or after start
that satisfies is_eol
.
find_prev_eol s ~start
is either 0
or the index of the byte at or before start
that satisfies is_eol
.
find_prev_sol s ~start
is either 0
or the position after the byte at or before start
that satisfies is_eol
. This can be Sys.max_string s
.
utf_8_decode_len b
is the length of an UTF-8 encoded Unicode character starting with byte b
. This is 1
on UTF-8 continuation or malformed bytes.
is_utf_8_decode c
is true
iff c
is not an UTF-8 continuation byte. This means c
is either an UTF-8 start byte or an UTF-8 malformed byte.
find_next_utf_8_sync s ~start
is either Sys.max_string s
or the index of the byte at or after start
that satisfies is_utf_8_decode
.
find_prev_utf_8_decode s ~start
is either 0
or the index of the byte at or before start
that satisfies is_utf_8_decode
.
is_white c
is true
iff c
is US-ASCII whitespace (0x20, 0x09, 0x0A, 0x0B, 0x0C or 0x0D).
find_next_white s ~start
is either String.length s
or the first byte position at or after start
such that is_white
is true
.
find_prev_white s ~start
is either either 0
or the first byte position at or before start
such that is_white
is true
.
find_next_after_eow
is either String.length s
or the byte position of the first is_white
after first skipping white and then non-white starting at start
.
find_prev_sow
is either 0
or the byte position after skipping backward first white and then non-white.
Note. This is a simple notion of grapheme cluster based on Uucp.Break.tty_width_hint
.
find_next_gc s ~after
is String.length s
or the byte position of the grapheme cluster after the one starting at after
.
find_next_gc_and_width s ~after
is like find_next_gc
but also returns in the second component the tty width of the grapheme cluster at after
.
find_prev_gc s ~before
is 0
or the the byte position of the grapheme cluster before the one starting at before
.
find_prev_eol_and_tty_width s ~before
is either 0
or the index of the byte before before
that satisfies is_eol
and in the second component, the tty width needed to go from that index to before
.