Down_std.TxtUTF-8 text handling, possibly malformed.
Note. start, after and before arguments can be out of bounds and in particular equal to the string length. Finding forwards returns the string length if it cannot be found, finding backwards returns 0 if it cannot be found.
find_next ~sat s ~start is either the Sys.max_string s or the index of the byte at or after start that satisfies sat.
find_prev ~sat s ~start is either the 0 or the index of the byte at or before start that satisfies sat.
keep_next_len ~sat s ~start is the number of consecutive next sat satisfying bytes starting at start, included.
keep_prev_len ~sat s ~start is the number of consecutive previous sat satisfying bytes starting at start, included.
lines s splits s into CR, CRLF, LF lines separated lines. This is [""] on the empty string.
find_next_eol s ~start is either Sys.max_string s or the index of the byte at or after start that satisfies is_eol.
find_prev_eol s ~start is either 0 or the index of the byte at or before start that satisfies is_eol.
find_prev_sol s ~start is either 0 or the position after the byte at or before start that satisfies is_eol. This can be Sys.max_string s.
utf_8_decode_len b is the length of an UTF-8 encoded Unicode character starting with byte b. This is 1 on UTF-8 continuation or malformed bytes.
is_utf_8_decode c is true iff c is not an UTF-8 continuation byte. This means c is either an UTF-8 start byte or an UTF-8 malformed byte.
find_next_utf_8_sync s ~start is either Sys.max_string s or the index of the byte at or after start that satisfies is_utf_8_decode.
find_prev_utf_8_decode s ~start is either 0 or the index of the byte at or before start that satisfies is_utf_8_decode.
is_white c is true iff c is US-ASCII whitespace (0x20, 0x09, 0x0A, 0x0B, 0x0C or 0x0D).
find_next_white s ~start is either String.length s or the first byte position at or after start such that is_white is true.
find_prev_white s ~start is either either 0 or the first byte position at or before start such that is_white is true.
find_next_after_eow is either String.length s or the byte position of the first is_white after first skipping white and then non-white starting at start.
find_prev_sow is either 0 or the byte position after skipping backward first white and then non-white.
Note. This is a simple notion of grapheme cluster based on Uucp.Break.tty_width_hint.
find_next_gc s ~after is String.length s or the byte position of the grapheme cluster after the one starting at after.
find_next_gc_and_width s ~after is like find_next_gc but also returns in the second component the tty width of the grapheme cluster at after.
find_prev_gc s ~before is 0 or the the byte position of the grapheme cluster before the one starting at before.
find_prev_eol_and_tty_width s ~before is either 0 or the index of the byte before before that satisfies is_eol and in the second component, the tty width needed to go from that index to before.