Module Cmarkit_base.Text

Textual content.

Ensures UTF-8 validity, unescapes, resolves numeric and named character references.

val utf_8_clean_unesc_unref : Stdlib.Buffer.t -> string -> first:int -> last:int -> string

utf_8_clean_unesc_unref b s ~first ~last unescapes CommonMark escapes, resolves HTML entity and character references in the given span and replaces U+0000 and UTF-8 decoding errors by Uchar.rep. b is used as scratch space. If last > first or first and last are not valid indices of s is "".

val utf_8_clean_unref : Stdlib.Buffer.t -> string -> first:int -> last:int -> string

utf_8_clean_unref b s ~first ~last is like utf_8_clean_unesc_unref but does not unsescape.

val utf_8_clean_raw : ?pad:int -> Stdlib.Buffer.t -> string -> first:int -> last:int -> string

utf_8_clean_raw b s ~first ~last replaces U+0000 and UTF-8 decoding errors by Uchar.rep. b is used as scratch space. pad (defaults to 0) specifies a number of U+0020 spaces to prepend. If last > first or first and last are not valid indices of s is either "" or the padded string.