Module Cmarkit_latex

Rendering CommonMark to LATEX.

Generates LATEX fragments, consult the integration notes for requirements on the document.

See a quick example and another one.

Warning. Rendering outputs are unstable, they may be tweaked even between minor versions of the library.

Rendering

val of_doc : ?backend_blocks:bool -> Cmarkit.Doc.t -> string

of_doc d is a LATEX fragment for d. See renderer for more details and documentation about rendering options.

Renderer

val renderer : ?backend_blocks:bool -> unit -> Cmarkit_renderer.t

renderer is a default LATEX renderer. This renders the strict CommonMark abstract syntax tree and the supported Cmarkit extensions.

The inline, block and document renderers always return true. Unknown block and inline values are rendered by a LATEX comment.

The following options are available:

  • backend_blocks, if true, code blocks with language =latex are written verbatim in the output and any other code block whose langage starts with = is dropped. Defaults to false.

See this example to extend or selectively override the renderer.

Render functions

Only useful if you extend the renderer.

val newline : Cmarkit_renderer.context -> unit

newline c starts a new line. Except on the first call on c which is a nop.

val latex_escaped_uchar : Cmarkit_renderer.context -> Stdlib.Uchar.t -> unit

latex_escaped_uchar c u renders the UTF-8 encoding of u on c propertly escaped for LATEX. That is the characters &%$#_{}~^\ are escaped. This also renders U+0000 to Uchar.rep.

val buffer_add_latex_escaped_uchar : Stdlib.Buffer.t -> Stdlib.Uchar.t -> unit

buffer_add_latex_escaped_uchar is latex_escaped_uchar but appends to a buffer value.

val latex_escaped_string : Cmarkit_renderer.context -> string -> unit

latex_escaped_string c s renders string s on c with characters &%$#_{}~^\ escaped. This also escapes U+0000 to Uchar.rep.

val buffer_add_latex_escaped_string : Stdlib.Buffer.t -> string -> unit

buffer_add_latex_escaped_string is latex_escaped_string but acts on a buffer value.

LATEX integration notes

Along with the built-in graphicx package, the following LATEX packages are needed to use the outputs of the default renderer:

tlmgr install enumitem listings hyperref  # Required
tlmgr install ulem                        # Strikethrough extension
tlmgr install bera fontspec               # Optional

This means you should have at least the following in your document preamble:

% Required
\usepackage{graphicx}
\usepackage{enumitem}
\usepackage{listings}
\usepackage{hyperref}
\usepackage[normalem]{ulem} % Strikethrough extension

% Optional
\usepackage[scaled=0.8]{beramono} % A font for code blocks
\usepackage{fontspec}             % Supports more Unicode characters

See the sections below for more details.

Character encoding

The output is UTF-8 encoded. It became the the default encoding for LATEX in 2018. But if you are using an older version a \usepackage[utf8]{inputenc} may be needed.

Using xelatex rather than pdflatex will not get stuck on missing glyphs.

The hyperref package is used to render links (\href) and autolink (\url). Link destination starting with a # are assumed to refer to section labels and are rendered using the \hyperref macro, with the # chopped.

Images

Images are inserted using the graphicx's package. Only images with relative URLs are supported, those that point to external ressources on the www are turned into links.

Section labels

Section labels are added to the output whenever Cmarkit.Block.Heading.id holds a value. If the identifier already exists it is made unique by appending "-" and the first number starting from 1 that makes it unique. Also the character _ seems problematic in labels even when escaped, we map it to - (if you know any better get in touch).

Lists

To support the starting point of ordereded lists without having to fiddle with enumi counters, the enumitem package is used.

Code blocks

If a language lang can be extracted from a code block info string, the listings package is used with the corresponding language in a lstlisting environment. Otherwise the built-in verbatim environment is used.

Note that the listings package has no definition for the ocaml language, the default renderings are a bit subpar and break on character literals with double quotes. This improves things:

\lstset{
  columns=[c]fixed,
  basicstyle=\small\ttfamily,
  keywordstyle=\bfseries,
  upquote=true,
  commentstyle=\slshape,
  breaklines=true,
  showstringspaces=false}

\lstdefinelanguage{ocaml}{language=[objective]caml,
   % Fixes double quotes in char literals
   literate={'"'}{\textquotesingle "\textquotesingle}3
            {'\\"'}{\textquotesingle \textbackslash"\textquotesingle}4,
}

Document frame

The default renderer only generates LATEX fragments. You may want to add a document frame. For example:

let latex_doc_of_md ?(title = "") md =
  let doc = Cmarkit.Doc.of_string md in
  let r = Cmarkit_latex.renderer () in
  let buffer_add_doc = Cmarkit_renderer.buffer_add_doc r in
  let buffer_add_title = Cmarkit_latex.buffer_add_latex_escaped_string in
  let maketitle = if title = "" then "" else {|\maketitle|} in
  Printf.kbprintf Buffer.contents (Buffer.create 1024)
{|\documentclass{article}

\usepackage{graphicx}
\usepackage{enumitem}
\usepackage{listings}
\usepackage{hyperref}
\usepackage[normalem]{ulem}
\usepackage[scaled=0.8]{beramono}
\usepackage{fontspec}

\lstset{
  columns=[c]fixed,
  basicstyle=\small\ttfamily,
  keywordstyle=\bfseries,
  upquote=true,
  commentstyle=\slshape,
  breaklines=true,
  showstringspaces=false}

\lstdefinelanguage{ocaml}{language=[objective]caml,
  literate={'"'}{\textquotesingle "\textquotesingle}3
            {'\\"'}{\textquotesingle \textbackslash"\textquotesingle}4,
}

\title{%a}
\begin{document}
%s
%a
\end{document}|} buffer_add_title title maketitle buffer_add_doc doc

Ignore this: ".