Module String.Sub

Substrings.

A substring defines a possibly empty subsequence of bytes in a base string.

The positions of a string s of length l are the slits found before each byte and after the last byte of the string. They are labelled from left to right by increasing number in the range [0;l].

positions  0   1   2   3   4    l-1    l
           +---+---+---+---+     +-----+
  indices  | 0 | 1 | 2 | 3 | ... | l-1 |
           +---+---+---+---+     +-----+

The ith byte index is between positions i and i+1.

Formally we define a substring of s as being a subsequence of bytes defined by a start and a stop position. The former is always smaller or equal to the latter. When both positions are equal the substring is empty. Note that for a given base string there are as many empty substrings as there are positions in the string.

Like in strings, we index the bytes of a substring using zero-based indices.

See how to use substrings to parse data.

Substrings

type t = sub

The type for substrings.

val empty : sub

empty is the empty substring of the empty string String.empty.

val v : ?⁠start:int -> ?⁠stop:int -> string -> sub

v ~start ~stop s is the substring of s that starts at position start (defaults to 0) and stops at position stop (defaults to String.length s).

  • raises Invalid_argument

    if start or stop are not positions of s or if stop < start.

val start_pos : sub -> int

start_pos s is s's start position in the base string.

val stop_pos : sub -> int

stop_pos s is s's stop position in the base string.

val base_string : sub -> string

base_string s is s's base string.

val length : sub -> int

length s is the number of bytes in s.

val get : sub -> int -> char

get s i is the byte of s at its zero-based index i.

  • raises Invalid_argument

    if i is not an index of s.

val get_byte : sub -> int -> int

get_byte s i is Char.to_int (get s i).

val head : ?⁠rev:bool -> sub -> char option

head s is Some (get s h) with h = 0 if rev = false (default) or h = length s - 1 if rev = true. None is returned if s is empty.

val get_head : ?⁠rev:bool -> sub -> char

get_head s is like head but

  • raises Invalid_argument

    if s is empty.

val of_string : string -> sub

of_string s is v s

val to_string : sub -> string

to_string s is the bytes of s as a string.

val rebase : sub -> sub

rebase s is v (to_string s). This puts s on a base string made solely of its bytes.

val hash : sub -> int

hash s is Hashtbl.hash s.

Stretching substrings

See the graphical guide.

val start : sub -> sub

start s is the empty substring at the start position of s.

val stop : sub -> sub

stop s is the empty substring at the stop position of s.

val base : sub -> sub

base s is a substring that spans the whole base string of s.

val tail : ?⁠rev:bool -> sub -> sub

tail s is s without its first (rev is false, default) or last (rev is true) byte or s if it is empty.

val extend : ?⁠rev:bool -> ?⁠max:int -> ?⁠sat:(char -> bool) -> sub -> sub

extend ~rev ~max ~sat s extends s by at most max consecutive sat satisfiying bytes of the base string located after stop s (rev is false, default) or before start s (rev is true). If max is unspecified the extension is limited by the extents of the base string of s. sat defaults to fun _ -> true.

  • raises Invalid_argument

    if max is negative.

val reduce : ?⁠rev:bool -> ?⁠max:int -> ?⁠sat:(char -> bool) -> sub -> sub

reduce ~rev ~max ~sat s reduces s by at most max consecutive sat satisfying bytes of s located before stop s (rev is false, default) or after start s (rev is true). If max is unspecified the reduction is limited by the extents of the substring s. sat defaults to fun _ -> true.

  • raises Invalid_argument

    if max is negative.

val extent : sub -> sub -> sub

extent s s' is the smallest substring that includes all the positions of s and s'.

  • raises Invalid_argument

    if s and s' are not on the same base string according to physical equality.

val overlap : sub -> sub -> sub option

overlap s s' is the smallest substring that includes all the positions common to s and s' or None if there are no such positions. Note that the overlap substring may be empty.

  • raises Invalid_argument

    if s and s' are not on the same base string according to physical equality.

Appending substrings

val append : sub -> sub -> sub

append s s' is like String:append. The substrings can be on different bases and the result is on a base string that holds exactly the appended bytes.

val concat : ?⁠sep:sub -> sub list -> sub

concat ~sep ss is like String.concat. The substrings can all be on different bases and the result is on a base string that holds exactly the concatenated bytes.

Predicates

val is_empty : sub -> bool

is_empty s is length s = 0.

val is_prefix : affix:sub -> sub -> bool

is_prefix is like String.is_prefix. Only bytes are compared, affix can be on a different base string.

val is_infix : affix:sub -> sub -> bool

is_infix is like String.is_infix. Only bytes are compared, affix can be on a different base string.

val is_suffix : affix:sub -> sub -> bool

is_suffix is like String.is_suffix. Only bytes are compared, affix can be on a different base string.

val for_all : (char -> bool) -> sub -> bool

for_all is like String.for_all on the substring.

val exists : (char -> bool) -> sub -> bool

exists is like String.exists on the substring.

val same_base : sub -> sub -> bool

same_base s s' is true iff the substrings s and s' have the same base string according to physical equality.

val equal_bytes : sub -> sub -> bool

equal_bytes s s' is true iff the substrings s and s' have exactly the same bytes. The substrings can be on a different base string.

val compare_bytes : sub -> sub -> int

compare_bytes s s' compares the bytes of s and s' in lexicographical order. The substrings can be on a different base string.

val equal : sub -> sub -> bool

equal s s' is true iff s and s' have the same positions.

  • raises Invalid_argument

    if s and s' are not on the same base string according to physical equality.

val compare : sub -> sub -> int

compare s s' compares the positions of s and s' in lexicographical order.

  • raises Invalid_argument

    if s and s' are not on the same base string according to physical equality.

Extracting substrings

Extracted substrings are always on the same base string as the substring s acted upon.

val with_range : ?⁠first:int -> ?⁠len:int -> sub -> sub

with_range is like String.sub_with_range. The indices are the substring's zero-based ones, not those in the base string.

val with_index_range : ?⁠first:int -> ?⁠last:int -> sub -> sub

with_index_range is like String.sub_with_index_range. The indices are the substring's zero-based ones, not those in the base string.

val trim : ?⁠drop:(char -> bool) -> sub -> sub

trim is like String.trim. If all bytes are dropped returns an empty string located in the middle of the argument.

val span : ?⁠rev:bool -> ?⁠min:int -> ?⁠max:int -> ?⁠sat:(char -> bool) -> sub -> sub * sub

span is like String.span. For a substring s a left empty span is start s and a right empty span is stop s.

val take : ?⁠rev:bool -> ?⁠min:int -> ?⁠max:int -> ?⁠sat:(char -> bool) -> sub -> sub

take is like String.take.

val drop : ?⁠rev:bool -> ?⁠min:int -> ?⁠max:int -> ?⁠sat:(char -> bool) -> sub -> sub

drop is like String.drop.

val cut : ?⁠rev:bool -> sep:sub -> sub -> (sub * sub) option

cut is like String.cut. sep can be on a different base string

val cuts : ?⁠rev:bool -> ?⁠empty:bool -> sep:sub -> sub -> sub list

cuts is like String.cuts. sep can be on a different base string

val fields : ?⁠empty:bool -> ?⁠is_sep:(char -> bool) -> sub -> sub list

fields is like String.fields.

Traversing substrings

val find : ?⁠rev:bool -> (char -> bool) -> sub -> sub option

find ~rev sat s is the substring of s (if any) that spans the first byte that satisfies sat in s after position start s (rev is false, default) or before stop s (rev is true). None is returned if there is no matching byte in s.

val find_sub : ?⁠rev:bool -> sub:sub -> sub -> sub option

find_sub ~rev ~sub s is the substring of s (if any) that spans the first match of sub in s after position start s (rev is false, default) or before stop s (rev is true). Only bytes are compared and sub can be on a different base string. None is returned if there is no match of sub in s.

val filter : (char -> bool) -> sub -> sub

filter sat s is like String.filter. The result is on a base string that holds only the filtered bytes.

val filter_map : (char -> char option) -> sub -> sub

filter_map f s is like String.filter_map. The result is on a base string that holds only the filtered bytes.

val map : (char -> char) -> sub -> sub

map is like String.map. The result is on a base string that holds only the mapped bytes.

val mapi : (int -> char -> char) -> sub -> sub

mapi is like String.mapi. The result is on a base string that holds only the mapped bytes. The indices are the substring's zero-based ones, not those in the base string.

val fold_left : ('a -> char -> 'a) -> 'a -> sub -> 'a

fold_left is like String.fold_left.

val fold_right : (char -> 'a -> 'a) -> sub -> 'a -> 'a

fold_right is like String.fold_right.

val iter : (char -> unit) -> sub -> unit

iter is like String.iter.

val iteri : (int -> char -> unit) -> sub -> unit

iteri is like String.iteri. The indices are the substring's zero-based ones, not those in the base string.

Pretty printing

val pp : Stdlib.Format.formatter -> sub -> unit

pp ppf s prints s's bytes on ppf.

val dump : Stdlib.Format.formatter -> sub -> unit

dump ppf s prints s as a syntactically valid OCaml string on ppf using Ascii.escape_string.

val dump_raw : Stdlib.Format.formatter -> sub -> unit

dump_raw ppf s prints an unspecified raw internal representation of s on ppf.

OCaml base type conversions

val of_char : char -> sub

of_char c is a string that contains the byte c.

val to_char : sub -> char option

to_char s is the single byte in s or None if there is no byte or more than one in s.

val of_bool : bool -> sub

of_bool b is a string representation for b. Relies on Stdlib.string_of_bool.

val to_bool : sub -> bool option

to_bool s is a bool from s, if any. Relies on Stdlib.bool_of_string.

val of_int : int -> sub

of_int i is a string representation for i. Relies on Stdlib.string_of_int.

val to_int : sub -> int option

to_int is an int from s, if any. Relies on Stdlib.int_of_string.

val of_nativeint : nativeint -> sub

of_nativeint i is a string representation for i. Relies on Nativeint.of_string.

val to_nativeint : sub -> nativeint option

to_nativeint is an nativeint from s, if any. Relies on Nativeint.to_string.

val of_int32 : int32 -> sub

of_int32 i is a string representation for i. Relies on Int32.of_string.

val to_int32 : sub -> int32 option

to_int32 is an int32 from s, if any. Relies on Int32.to_string.

val of_int64 : int64 -> sub

of_int64 i is a string representation for i. Relies on Int64.of_string.

val to_int64 : sub -> int64 option

to_int64 is an int64 from s, if any. Relies on Int64.to_string.

val of_float : float -> sub

of_float f is a string representation for f. Relies on Stdlib.string_of_float.

val to_float : sub -> float option

to_float s is a float from s, if any. Relies on Stdlib.float_of_string.

Substring stretching graphical guide

+---+---+---+---+---+---+---+---+---+---+---+
| R | e | v | o | l | t |   | n | o | w | ! |
+---+---+---+---+---+---+---+---+---+---+---+
        |---------------|                      a
        |                                      start a
                        |                      stop a
            |-----------|                      tail a
        |-----------|                          tail ~rev:true a
        |-----------------------------------|  extend a
|-----------------------|                      extend ~rev:true a
|-------------------------------------------|  base a
|-----------|                                  b
|                                              start b
            |                                  stop b
    |-------|                                  tail b
|-------|                                      tail ~rev:true b
|-------------------------------------------|  extend b
|-----------|                                  extend ~rev:true b
|-------------------------------------------|  base b
|-----------------------|                      extent a b
        |---|                                  overlap a b
                            |                  c
                            |                  start c
                            |                  stop c
                            |                  tail c
                            |                  tail ~rev:true c
                            |---------------|  extend c
|---------------------------|                  extend ~rev:true c
|-------------------------------------------|  base c
        |-------------------|                  extent a c
                                         None  overlap a c
                            |---------------|  d
                            |                  start d
                                            |  stop d
                                |-----------|  tail d
                            |-----------|      tail ~rev:true d
                            |---------------|  extend d
|-------------------------------------------|  extend ~rev:true d
|-------------------------------------------|  base d
                            |---------------|  extent d c
                            |                  overlap d c