Otfm (otfm.Otfm)

Unicode code points

For some reason OpenType allows the (textually meaningless) surrogate code points to be mapped to glyphs. Hence we deal with Unicode code points not scalar values (i.e. Uchar.t values).

type cp = int

The type for Unicode code points, ranges from 0x0000 to 0x10_FFFF. Any code point returned by Otfm is guaranteed to be in the range.

type cp_range = cp * cp

The type for Unicode code point ranges. Any range (u0, u1) returned by Otfm has u0 <= u1.

val is_cp : int -> bool

is_cp i is true if i is an Unicode code point.

val pp_cp : Stdlib.Format.formatter -> cp -> unit

pp_cp ppf cp prints an unspecified representation of cp on ppf.

Identifiers

Types for various identifiers found in OpenType.

type encoding_id = int

The type for encoding ids.

type format_id = int

The type for format ids.

type glyph_id = int

The type for glyph ids, from 0 to 65534.

type lang = string

The type for BCP 47 language tags.

type platform_id = int

The type for platform ids.

Decoding

type error_ctx = [

| `Table of tag
| `Ttc_header
| `Offset_table
| `Table_directory

]

The type for error contexts.

type error = [

| `Invalid_cp of int
| `Invalid_cp_range of int * int
| `Invalid_offset of error_ctx * int
| `Invalid_postscript_name of string
| `Missing_required_table of tag
| `Unexpected_eoi of error_ctx
| `Unknown_composite_format of error_ctx * int
| `Unknown_flavour of tag
| `Unknown_loca_format of error_ctx * int
| `Unknown_version of error_ctx * int32
| `Unsupported_TTC
| `Unsupported_cmaps of (platform_id * encoding_id * format_id) list
| `Unsupported_glyf_matching_points

]

The type for decoding errors.

Note. In case of `Invalid_poscript_name a string of bytes is returned.

val pp_error : Stdlib.Format.formatter -> [< error ] -> unit

pp_error ppf e prints an uspecified representation of e on ppf.

val error_to_string : error -> string

error_to_string e is a human error message for e.

type src = [

| `String of string

]

The type for input sources.

type decoder

The type for OpenType font decoders.

val decoder : [< src ] -> decoder

decoder src is a decoder decoding from src.

Note. This errors on font collections with `Unsupported_TTC. Use decoder_collection if you need to support font collections.

val decoder_src : decoder -> src

decoder_src d is d's input source.

val decoder_collection : [< src ] -> (decoder list, error) Stdlib.result

decoder_collection src decodes either a font or a font collection.

Flavour

type flavour = [

| `TTF
| `CFF

]

The type for OpenType flavours.

val flavour : decoder -> (flavour, error) Stdlib.result

decode_flavour d is the flavour of the font decoded by d.

val in_collection : decoder -> bool

in_collection d is true iff d is a font of a collection.

Table decoding

These functions can be used in any order and are robust: when they return an error the decoder is back to a consistent state and can be used further. However if flavour or table_list returns an error you can safely assume that all other functions will. The fields are in general not documented please refer to the OpenType specification for details.

val table_list : decoder -> (tag list, error) Stdlib.result

table_list t is the list of tables of the font decoded by d.

val table_mem : decoder -> tag -> (bool, error) Stdlib.result

table_mem d t is true if table t is in the font decoded by d.

val table_raw : decoder -> tag -> (string option, error) Stdlib.result

table_raw d t is the (unpadded) data of the table t as a string if the table t exists.

CFF table

val cff : decoder -> (string, error) Stdlib.result

cmap table

type map_kind = [

| `Glyph
| `Glyph_range

]

The type for map kinds.

Determines how an unicode range (u0, u1) and a glyph id gid must be interpreted in the folding function of cmap table.

`Glyph all characters in the range map to to gid.
`Glyph_range, u0 maps to gid, u0 + 1 to gid + 1, ... and u1 to gid + (u1 - u0)

val cmap : 
  decoder ->
  ('a -> map_kind -> cp_range -> glyph_id -> 'a) ->
  'a ->
  ((platform_id * encoding_id * format_id) * 'a, error) Stdlib.result

cmap d f acc folds over a mapping from unicode scalar values to glyph ids by reading the cmap table.

Tip. Use the convience cmap_fold_uchars function to directly get a map from Uchar.t to glyph identifiers.

Limitations. Only the format 13 (last resort font), format 12 (UCS-4) and format 4 (UCS-2) cmap table formats are supported.

If multiple tables are present, it favours 13 over 12 over 4. If multiple tables of the same format are present it takes the first one it finds.

If no supported cmap table is found the error `Unsupported_cmaps is returned with the list of platform id, encoding id, format available in the font.

glyf table

type glyf_loc

The type for glyph locations. See loca table table.

type glyph_simple_descr = (bool * int * int) list list

The type for simple glyph descriptions. Lists of contours, contours are list of points with a boolean indicating whether the point is on or off curve.

type glyph_composite_descr =
  (glyph_id * (int * int) * (float * float * float * float) option) list

The type for glyph composites. A list of components made of a glyph id, a translation and an optional linear transform a b c d (column major).

type glyph_descr =
  [ `Simple of glyph_simple_descr | `Composite of glyph_composite_descr ]
  * (int * int * int * int)

The type for glyph descriptions. A simple or composite descriptions with the glyph's (minx, miny, maxx, maxy)'s bounding box.

val glyf : decoder -> glyf_loc -> (glyph_descr, error) Stdlib.result

glyf d loc is the glyph description located at loc by reading the glyf table. Glyph locations are obtainted via loca table.

head table

type head = {

head_font_revision : int32;
head_flags : int;
head_units_per_em : int;
head_created : float;
(*
Unix timestamp.
*)
head_modified : float;
(*
Unix timestamp.
*)
head_xmin : int;
head_ymin : int;
head_xmax : int;
head_ymax : int;
head_mac_style : int;
head_lowest_rec_ppem : int;
head_index_to_loc_format : int;

}

The type for representing head tables.

val head : decoder -> (head, error) Stdlib.result

head d is the head table.

hhea table

type hhea = {

hhea_ascender : int;
hhea_descender : int;
hhea_line_gap : int;
hhea_advance_width_max : int;
hhea_min_left_side_bearing : int;
hhea_min_right_side_bearing : int;
hhea_xmax_extent : int;
hhea_caret_slope_rise : int;
hhea_caret_slope_run : int;
hhea_caret_offset : int;

}

The type for hhea tables.

val hhea : decoder -> (hhea, error) Stdlib.result

hhea d is the hhea table.

hmtx table

val hmtx : 
  decoder ->
  ('a -> glyph_id -> int -> int -> 'a) ->
  'a ->
  ('a, error) Stdlib.result

hmtx d f acc folds over the horizontal metrics of the font by reading the hmtx table. f is applied on each entry with f acc' gid adv lsb with gid the glyph id (guaranteed to range, in order, from 0 to glyph count minus one), adv the (unsigned) advance width, and lsb the (signed) left side bearing.

name table

val name : 
  decoder ->
  ('a -> int -> lang -> string -> 'a) ->
  'a ->
  ('a, error) Stdlib.result

name d f acc folds over the name records of the font by reading the name table. f is applied on each name id entry with f acc' nid lang name with nid the name id, lang the language tag, and name the UTF-8 encoded name value.

Note. The module normalizes Windows language ids to lowercased BCP 47 ids. Language tags found in language tag records should be BCP 47 language tags but are not checked for conformance.

Tip. If you are looking for the postcript name use postscript_name.

Limitations. Lookups data only in platform ids 0, 2 and 3 (Unicode, ISO and Windows) with UTF-16BE encoding and reports only the data of the first one it finds for a given name id.

OS/2 table

type os2 = {

os2_x_avg_char_width : int;
os2_us_weight_class : int;
os2_us_width_class : int;
os2_fs_type : int;
os2_y_subscript_x_size : int;
os2_y_subscript_y_size : int;
os2_y_subscript_x_offset : int;
os2_y_subscript_y_offset : int;
os2_y_superscript_x_size : int;
os2_y_superscript_y_size : int;
os2_y_superscript_x_offset : int;
os2_y_superscript_y_offset : int;
os2_y_strikeout_size : int;
os2_y_strikeout_position : int;
os2_family_class : int;
os2_panose : string;
(*
10 bytes
*)
os2_ul_unicode_range1 : int32;
os2_ul_unicode_range2 : int32;
os2_ul_unicode_range3 : int32;
os2_ul_unicode_range4 : int32;
os2_ach_vend_id : int32;
os2_fs_selection : int;
os2_us_first_char_index : int;
os2_us_last_char_index : int;
os2_s_typo_ascender : int;
os2_s_type_descender : int;
os2_s_typo_linegap : int;
os2_us_win_ascent : int;
os2_us_win_descent : int;
os2_ul_code_page_range_1 : int32 option;
os2_ul_code_page_range_2 : int32 option;
os2_s_x_height : int option;
os2_s_cap_height : int option;
os2_us_default_char : int option;
os2_us_break_char : int option;
os2_us_max_context : int option;

}

The type for OS/2 tables.

val os2 : decoder -> (os2, error) Stdlib.result

os2 d is the OS/2 table.

kern table

type kern_info = {

kern_dir : [ `H | `V ];
kern_kind : [ `Min | `Kern ];
kern_cross_stream : bool;

}

The type for kerning (sub)table information.

val kern : 
  decoder ->
  ('a -> kern_info -> [ `Skip | `Fold ] * 'a) ->
  ('a -> glyph_id -> glyph_id -> int -> 'a) ->
  'a ->
  ('a, error) Stdlib.result

kern d t p acc folds over the kerning tables of d by reading the kern table. t is called on each new (sub)table, the table pairs are skipped if it returns `Skip otherwise p acc' left right value is called on each kerning pair of the table. The function returns acc if there is no kern table.

Limitations. Only format 0 kerning tables are supported.

loca table

val loca : decoder -> glyph_id -> (glyf_loc option, error) Stdlib.result

loca d gid looks up the location of the glyph with id gid by reading the loca table. The result can be used with glyf to lookup the glyph.

Convenience decodes

These functions lookup data in the right table.

val glyph_count : decoder -> (int, error) Stdlib.result

glyph_count d is the number of glyphs in the font (bounded by 65535).

val postscript_name : decoder -> (string option, error) Stdlib.result

poscript_name d is the PostScript name of d. Looks up and validates as mandated by the OTF standard, don't rely on name table if you really need this information.

val cmap_fold_uchars : 
  decoder ->
  uchar:(Stdlib.Uchar.t -> glyph_id -> 'a -> 'a) ->
  surrogate:(cp -> glyph_id -> 'b -> 'b) ->
  'a ->
  'b ->
  ((platform_id * encoding_id * format_id) * ('a * 'b), error) Stdlib.result

cmap_uchar_fold d ~uchar ~surrogate uacc cacc is src, (us, cps) with

src identifying the source table of the data.
us the result of the uchar fold starting with acc
cps the result of the surrogate fold staring with cps

This uses cmap, see the limitations there.

Limitations

As it stands Otfm has the following limitations. Some of these may be lifted in the future and a few of these can be overcome by pre-processing your font (e.g. removing hinting information to reduce the font size). See also the individual table decoding functions for other limitations.

The whole font needs to be loaded in memory as a string. This may be a limiting factor on 32 bits platforms (but non .ttc font files tend to be smaller than 16 Mo).
Table checksums are not verified.

Examples

The following code prints the postscript name of the font on stdout.

  let otf_postscript_name bytes =
    let d = Otfm.decoder (`String bytes) in
    match Otfm.postscript_name d with
    | Error e -> Format.eprintf "@[%a@]@." Otfm.pp_error e
    | Ok (Some n) -> Format.printf "%s@." n;
    | Ok None -> ()

Module `Otfm`

Tags