Cmarkit_commonmark
Rendering CommonMark to CommonMark.
Generates CommonMark. If your document was parsed with layout:true
, it preserves most of the source layout on output. This won't be perfect, make sure you understand the details before reporting issues.
See an example.
Warning. Rendering outputs are unstable. They may be tweaked even between minor versions of the library.
val of_doc : Cmarkit.Doc.t -> string
of_doc d
is a CommonMark document for d
. See renderer
for more details.
val renderer : unit -> Cmarkit_renderer.t
renderer ()
is the default CommonMark renderer. This renders the strict CommonMark abstract syntax tree and the supported Cmarkit extensions.
The inline, block and document renderers always return true
. Unknown block and inline values are rendered by an HTML comment (as permitted by the CommonMark specification).
See this example to extend or selectively override the renderer.
Only useful if you extend the renderer.
val newline : Cmarkit_renderer.context -> unit
newline c
starts a new line, except on the first call on c
which is a nop.
type indent = [
|
`I of int
Identation by given amount.
*)|
`L of int * string * int * Stdlib.Uchar.t option
Indent before, list marker, indent after, list item task extension
*)|
`Q of int
Identation followed by a block quote marker and a space
*)|
`Fn of int * Cmarkit.Label.t
Indent before, label (footnote extension)
*) ]
The type for specifying block indentation.
val push_indent : Cmarkit_renderer.context -> indent -> unit
push_indent c i
pushes i
on the current indentation of c
. This does not render anything.
val pop_indent : Cmarkit_renderer.context -> unit
pop_indent c
pops the last indentation pushed on c
. This does not render anything.
val indent : Cmarkit_renderer.context -> unit
indent i c
outputs current indentation on c
. Note that `L
and `Fn
get replaced by an `I
indent on subsequent lines, that is the list or foonote marker is output only once.
val escaped_string :
?esc_ctrl:bool ->
Cmarkit_renderer.context ->
Char_set.t ->
string ->
unit
escaped_string ?esc_ctrl c cs s
renders s
on c
with characters in cs
backslash escaped. If esc_ctrl
is true
(default) ASCII control characters are escaped to decimal escapes.
buffer_add_escaped_string b cs s
is escaped_string
but appends to a buffer value.
val escaped_text : Cmarkit_renderer.context -> string -> unit
escaped_text c s
renders s
on c
trying to be smart about escaping Commonmark structural symbols for Cmarkit.Inline.Text
inlines. We assume text can be anywhere in a sequence of inlines and in particular that it can start a line. This function also takes into account the existence of the extensions.
As such we escape:
-
+
_
=
only if present at s.[0]
.#
`
~
(strikethrough extension).&
if followed by an US-ASCII letter or #
.!
if it is the last character of s
..
or )
only if preceeded by at most 9 digits to the start of text.*
_
\
<
>
[
]
, ASCII control characters, $
(inline math extension), |
(table extension)buffer_add_escaped_text b s
is escaped_text
but appends to a buffer value.
The abstract syntax tree has a few block cases and data fields to represent the source document layout. This allows to update CommonMark documents without normalizing them too much when they are parsed with layout:true
.
To keep things reasonably simple a few things are not attempted like:
In general we try to keep the following desirable properties for the abstract syntax tree definition:
layout:tree
should all render to the same HTML.In practice CommonMark being not context free point 1. is not always achieved. In particular in Cmarkit.Inline.Code_span
the number of delimiting backticks depends on the code content (Cmarkit.Inline.Code_span.of_string
, computes that for you).
The renderer performs almost no checks on the layout data. You should be careful if you fill these yourself since you could generate CommonMark that will be misinterpreted. Layout data of pristine nodes coming out of Cmarkit.Doc.of_string
, created with the Cmarkit.Inline
and Cmarkit.Block
constructors should not need your attention (respect their input constraints though).
We say that a CommonMark render:
is correct, if the result renders the same HTML as the source document. This can be checked with the cmarkit
tool included in the distribution:
cmarkit commonmark --html-diff mydoc.md
If a difference shows up, the rendering is said to be incorrect.
round trips, if the result is byte-for-byte equal to the source document. This can be checked with the cmarkit
tool included in the distribution:
cmarkit commonmark --diff mydoc.md
If a difference shows up, the rendering does not round trip but it may still be correct.
In general lack of round trip is due to:
Please do not report issues for differences that are due to the following:
'>'
is made mandatory.Cmarkit.Doc.t
value.Simple and implemented round trip improvements to the renderer are welcome.
Please do not report issues incorrect renderings that are due to the following (and unlikely to be fixed):
Use of entities and character references around structural CommonMark symbols can make things go wrong. These get resolved after inline parsing because they can't be used to stand for structural CommonMark symbols, however once they have been resolved they can interact with parsing. Here is an example:
*emph *
It parses as emphasis. But if we render it to CommonMark non-breaking space renders as is and we get:
*emph *
which no longer parses as emphasis.
Note in this particular case it is possible to do something about it by being smarter about the context when escaping. However there's a trade-off between renderer complexity and the (conjectured) paucity of these cases.
Otherwise, if you spot an incorrect rendering please report a minimal reproduction case.
Simple and implemented round trip improvements to the renderer are welcome.