Easy OCaml scripts
Write programs not scripts. I know… but at least quit your abusive relationship to sh
derived linguistic disasters.
A b0caml
script is not different from an ocaml
one, it just restricts the toplevel directives you can use and makes it easy for you to tap into installed third-party modules.
Here's a simple echo
script:
cat > echo <<EOCAML
#!/usr/bin/env b0caml
let echo oc ss = output_string oc (String.concat " " ss ^ "\n")
let () = echo stdout (List.tl (Array.to_list Sys.argv))
EOCAML
> chmod +x ./echo
> ./echo grunt
grunt
> b0caml ./echo grunt # for Windows compatible invocations
grunt
Except for the shebang line, nothing different.
If you have compiled modules in a directory DIR
that you want to use in your script, add the following directive after the shebang line, before the source and before any comment because b0caml
authors are very lazy:
#directory "DIR"
If DIR
is relative it is made absolute with respect to the directory of the script. Using the "+DIR"
syntax looks for modules in the DIR
directory of the directories mentioned in the OCAMLPATH
environment variable.
ocaml
also has the #directory
directive but b0caml
treats it a bit differently. First b0caml
errors if DIR
or +DIR
does not resolve to an existing directory. Second b0caml
knows how to load the implementation and dependencies of the modules it finds in that directory.
The following script uses the Ptime
and Ptime_clock
modules. These modules are installed by the ptime
package in the ptime
and ptime/clock/os
of a directory assumed to be in the OCAMLPATH
:
> cat > local-time <<EOCAML
#!/usr/bin/env b0caml
#directory "+ptime"
#directory "+ptime/clock/os"
let to_string () =
let now = Ptime_clock.now () in
let tz_offset_s = Ptime_clock.current_tz_offset_s () in
Format.asprintf "%a" (Ptime.pp_human ?tz_offset_s ()) now
let main () = print_endline (to_string ())
let () = if !Sys.interactive then () else main ()
EOCAML
> chmod +x local-time
> ./local-time
1995-09-12 11:27:13 +02:00
Since ocaml
does not know how to load the implementation of the interfaces it finds in #directory
directives you cannot directly load a b0caml
script in the toplevel.
b0caml
provide the --top
(or --utop
) option which loads a script and the module it needs in the OCaml toplevel for interactive testing and debugging:
> b0caml --top local-time # load script and deps in the toplevel OCaml version 4.08.0 # Local_time.to_string ();; - : string = "1995-09-12 11:27:13 +02:00"
The name of the module for the script is determined by mangling the script filename.
If your script parses command line arguments or uses exit
you should properly isolate these computations in a main
function and prevent its invocation whenever Sys
.interactive is true
(see for example the source of local-time
above).
Repeat after me, write a program not a script.
A b0caml
script can import an OCaml implementation source SRC
with the #mod_use "SRC"
directive. SRC
must be a regular OCaml implementation, it cannot be a b0caml
script. A relative SRC
is made absolute with respect to the directory of the script.
These directives should also appear only after the shebang line and before the script source or comments. The file SRC
must exist or the scripts errors.
A quick and dirty configuration file for a script screams to #mod_use
:
> cat > conf.ml <<EOCAML let lang = "fr" EOCAML > cat > miaow <<EOCAML #!/usr/bin/env b0caml #mod_use "conf.ml" let scream = match Conf.lang with | "fr" -> "Miaou!" | _ -> "Miaow!" let main () = print_endline scream let () = if !Sys.interactive then () else main () EOCAML > chmod +x miaow > ./miaow Miaou!
#mod_use
mangles the filename of the path to define a module name and implementation in which the contents of the file is included litteraly. In the example above the line #mod_use
"conf.ml"
is simply expanded to:
module Conf = struct
#1 "/absolute/path/to/conf.ml"
let lang = "fr"
end
Knowing OCaml's scoping rules it should be easy to see that a module you #mod_use
can refer to the modules #mod_use
d before. But it is your duty to provide them in the right order. The relative order between #directory
and #mod_use
directives doesn't matter, you can consider all #directory
directives to be written before the first #mod_use
.
A #mod_use
d implementation is constrained by an interface if there's a side .mli
for the file you include. For the full details read here.
If you want to see the #mod_use
expansions that are performand by b0caml
on a script, the following invocation prints the final script source before it gets compiled:
> b0caml --source miaow
No formal dependency management is provided for the third-party modules you use – WRITE A PROGRAM NOT A SCRIPT !
However the #directory
and #mod_use
directives of a script and can be resolved with the deps
subcommand:
> b0caml deps local-time /usr/lib/ocaml/ptime/ /usr/lib/ocaml/ptime/os/ > b0caml deps miaow /home/camelus/conf.ml
These invocations error and the program exits with a non-zero exit code if the directives do not resolve. For a +DIR
directory directive, resolution checks the directory DIR
exists in at least one of the directories mentioned in the OCAMLPATH
environment variable.
The --raw
option prevents resolution and reports the verbatim directive arguments.
> b0caml deps --raw local-time +ptime +ptime/os > b0caml deps --raw miaow conf.ml
The --root
option also eschews resolution and outputs root directory names of +
directory directives:
> b0caml deps --root local-time # Extract +DIR roots ptime > opam install $(b0caml deps --root local-time) # You must be joking...
The first time a b0caml
script runs it gets compiled and cached. This incurs a small overhead. If you want to avoid it, or simply test that it compiles without running it use the --compile
option:
b0caml --compile local-time # compile and cache the script
b0caml log local-time # Output build log of [local-time] b0caml log -l local-time # More details...
By default the script compilation cache location is determined according to the XDG_CACHE_HOME
convention. The actual location of the cache can be printed via
b0caml cache path # print path to the cache b0caml cache delete local-time # delete local-time build b0caml cache # delete the cache b0caml cache size # print stats about the cache b0caml cache trim # trim the cache to 50% of its size
Configuration is looked up in XDG_CONFIG_DIR/b0caml/config
. The configuration file is a sequence of s-expressions. Here's a sample file:
(max-cache-size-mb 500) (compilation-target byte) ; force use of bytecode (ocamlopt) (compilation-env ; Specify the OCaml compilation environment (OCAMLPATH /usr/local/ocaml) (PATH /usr/local/bin))
The following keys are defined:
CODE
(byte
or native
)We hope to eventually convince ocamlmerlin
to understand #directory
directives and abide by OCAMLPATH
the way b0caml
does. This will have merlin work out of the box in your script without having to specify anything. If you are using #mod_use
you will be punished accordingly.
One thing that remains is for your editor to treat files with b0caml
's shebang line as an OCaml file. Follow the instructions below according to your editor.
Add one of the following line to your .emacs
depending on the OCaml mode you are using.
(add-to-list 'interpreter-mode-alist '("b0caml" . caml-mode)) (add-to-list 'interpreter-mode-alist '("b0caml" . tuareg-mode))
A b0caml
script is an optional shebang line, followed by white space (no comments) separated directives, followed by an OCaml unit implementation.
Using an RFC 5234 grammar this reads as:
script = [shebang] *(ws directive) ws unit-implementation shebang = "!#" *(%x00-%xFF) nl directive = dir-dir / dir-use dir-dir = "#directory" ws %x22 dchar *dchar %x22 dir-use = "#mod_use" ws %x22 dchar *dchar %x22 dchar = escape / cont / ws / %x21 / %x23-%x5B / %x5D-%x7E / %x80-xFF escape = %x5C (%x20 / %x22 / %x5C) cont = %x5C nl ws ws = *(%x20 / %x09 / %x0A / %x0B / %x0C / %x0D) nl = %x0A / %x0D / %x0D %x0A unit-implementation = ... ; See the syntax in the OCaml manual
This syntax is a subset of ocaml
's one. However b0caml
attributes slightly different semantics to the directives.
The following parts can be distinguished in a b0caml
script:
b0caml
.#directory
directives and those of #mod_use
directives.The final source of the script is created by concatenating the expansion of the #mod_use
directives followed by the OCaml unit implementation. This source is compiled in a compilation environment defined by the #directory
directives to a module or a program to be exec
uted.
#directory
directiveThe syntax of the #directory
directive is:
#directory "DIR"
The semantics is to simply add the file path DIR
to the ordered list of directories looked up for third-party modules. From a compilation perspective you can see that as -I DIR
options given to the compiler.
If DIR
is relative it is made absolute with respect to the directory of the script. The +DIR
syntax indicates to add all the existing DIR
directories from the directories mentioned in the OCAMLPATH
environment variable.
The directories have to resolve to existing directories or the script errors. For +DIR
it must exist in at least one of the directories of OCAMLPATH
.
#mod_use
directiveThe syntax of the #mod_use
directive is:
#mod_use "PATH"
The semantics is to define a module with the contents of PATH
at that location. The name of the module is defined by mangling the file name of PATH
.
If PATH
is relative it is interpreted relative to the script's directory.
For example assuming the filename of PATH
is file.ml
, the directive expands to:
module File = struct
#1 "PATH"
(* contents of PATH *)
end
If PATH
has a corresponding .mli
file say MLI
in the same directory, the directive expands to:
module File : sig
#line 1 "MLI"
(* contents of MLI *)
end = struct
#line 1 "PATH"
(* contents of PATH *)
end
b0caml
uses the following filename mangling convention to produce OCaml module names from arbitrary filenames:
.ml
or .mli
.-
(0x2D) or dot .
(0x2E) to an underscore _
(0x5F).'M'
.Note that the transformation is not injective. Here are a few examples:
filename Module name ---------------------------------------- publish-website Publish_website publish_website Publish_website import-data.ml Import_data import-data.xml.ml Import_data_xml import-data.script Import_data_script mix+match Mix2Bmatch _release.ml M_release
On script execution b0caml
terminates with the exit code of you script. However that code may be determined by b0caml
itself in case it doesn't get to execute the script. These code may muddle with your own script's exit codes, here's the list of these:
127
, compilation error. This is what shells usually report with when they can't find a command in the tool search path.125
, unexpected internal error.124
, command line parsing error.123
, configuration error.Given a script script.ml
its final source SRC
is extracted as defined here.
SRC
by expanding script.ml
's #mod_use
directive as described here and appending script.ml
's OCaml compilation unit.SRC
is compiled to a byte or native code executable via single invocation to the OCaml compiler, the includes specified via #directory
directives and all the library archives that are found in these directories along with their dependencies as determined by Archive dependency lookuparchive dependency lookup