module Break:sig
..end
These properties are mainly for the Unicode text segmentation and line breaking algorithm.
typeline =
[ `AI
| `AL
| `B2
| `BA
| `BB
| `BK
| `CB
| `CJ
| `CL
| `CM
| `CP
| `CR
| `EX
| `GL
| `H2
| `H3
| `HL
| `HY
| `ID
| `IN
| `IS
| `JL
| `JT
| `JV
| `LF
| `NL
| `NS
| `NU
| `OP
| `PO
| `PR
| `QU
| `RI
| `SA
| `SG
| `SP
| `SY
| `WJ
| `XX
| `ZW ]
val pp_line : Format.formatter -> line -> unit
pp_line ppf l
prints an unspecified representation of l
on
ppf
.val line : Uucp.uchar -> line
typegrapheme_cluster =
[ `CN | `CR | `EX | `L | `LF | `LV | `LVT | `PP | `RI | `SM | `T | `V | `XX ]
val pp_grapheme_cluster : Format.formatter -> grapheme_cluster -> unit
pp_grapheme_cluster ppf g
prints an unspecified representation of g
on ppf
.val grapheme_cluster : Uucp.uchar -> grapheme_cluster
typeword =
[ `CR
| `DQ
| `EX
| `Extend
| `FO
| `HL
| `KA
| `LE
| `LF
| `MB
| `ML
| `MN
| `NL
| `NU
| `RI
| `SQ
| `XX ]
val pp_word : Format.formatter -> word -> unit
pp_grapheme_cluster ppf g
prints an unspecified representation of g
on ppf
.val word : Uucp.uchar -> word
typesentence =
[ `AT
| `CL
| `CR
| `EX
| `FO
| `LE
| `LF
| `LO
| `NU
| `SC
| `SE
| `SP
| `ST
| `UP
| `XX ]
val pp_sentence : Format.formatter -> sentence -> unit
pp_grapheme_cluster ppf g
prints an unspecified representation of g
on ppf
.val sentence : Uucp.uchar -> sentence
typeeast_asian_width =
[ `A | `F | `H | `N | `Na | `W ]
val pp_east_asian_width : Format.formatter -> east_asian_width -> unit
pp_east_asian_width ppf w
prints an unspecified representation of
w
on ppf
.val east_asian_width : Uucp.uchar -> east_asian_width
val tty_width_hint : Uucp.uchar -> int
tty_width_hint u
approximates u
's column width as rendered by a
typical character terminal.
The current implementation of the function returns either 0
,
1
, 2
or -1
. The value -1
is only returned for scalar
values for which the property is non-sensical; clients are
expected to sanitize their inputs and not to use the function
with these scalar values which are those in range U+0001-U+001F
(C0 controls without U+0000) and U+007F-U+009F (DELETE and
C1 controls).
Note. Converting a string to normalization form C before folding this function over its scalar values will, in general, yield better approximations (e.g. on Hangul).
Warning. This is not a normative property and only a heuristic. If you find yourself using this function please read carefully the following lines.
This function is the moral equivalent of POSIX
wcwidth
, in that its purpose is to help align text displayed by a
character terminal. It mimics wcwidth
, as widely implemented, in yet
another way: it is mostly wrong.
Computing column width is a surprisingly difficult task in general. Much of the software infrastructure still carries legacy assumptions about the nature of text harking back to the ASCII era. Different terminal emulators attempt to cope with general Unicode text in different ways, creating a fundamental problem: width of text fragments will vary across terminal emulators, with no way of getting feedback from the output layer back into the text-producing layer.
For example: on a modern Linux system, a collection of terminals
will disagree on some or all of U+00AD, U+0CBF, and
U+2029. They will likewise disagree about unassigned
characters (category Cn), sometimes contradicting the
system's wcwidth
(e.g. U+0378, U+0530). Terminals using
bare libxft
will display complex scripts differently from terminals using
HarfBuzz,
and the rendering on OS X will be slightly different from both.
tty_width_hint
uses a simple and predictable width algorithm, based
on Markus Kuhn's
portable wcwidth
:
-1
).2
.0
.1
, including Cn.This approach works well, in that it gives results generally consistent with a wide range of terminals, for alphabetic scripts, and for east Asian syllabic and logographic scripts in non-decomposed form. Support varies for abjad scripts in the presence of vowel marks, and it mostly breaks down on abugidas.
Moreover, non-text symbols like
Emoji
or Yijing hexagrams
will be incorrectly classified as 1
-wide, but this in fact
agrees with their rendering on many terminals.
Clients should not over-rely on tty_width_hint
. It provides a
best-effort approximation which will sometimes fail in
practice.
module Low:sig
..end