B00 manual

Design considerations

Kernel extraction from B0

Concepts kept:

Concepts dropped and to be seen how we can recover them at a higher level:

TODO and resolve

Handling removals

Knowing files before the build starts doesn't fit the build model, besides removals is under the responsability of `Memo` clients (e.g. for now `odig` doesn't deal with it). The approach currently taken is the first one below. The second one could be considered but hampers correctness.

  1. Clean builds. Usually builds are performed in a dedicated directory, always delete the previous directory at the beginning of the build and restart from scratch. Deleting is slow and should not impede the build, hence the B00.Memo.trash that simply renames to a trash directory and the asynchronous deletion at the end of the build.
  2. Diffing. If we know which files were generated in the previous run (easy to get from the build build log), we can remove stale artefacts after the new run has been completed (and do a clean build if you lost the info). This may however lead to incorrect builds (`-I` problems) where tool may pickup stale artefacts

Note. Now that we do longer use hard links we would have to hash. So what follows below likely becomes impractical.

A middle ground between the two could be the following. Currently we trash paths in a dedicated directory and delete it asynchronously at the end of the build as this may be slow. However we could track the renames to the path directory. When we revive from the cache to a given path one can check if the path exists in the trash with the same inode number, if it does we can rename it from there to the build rather than create a new link.

The nice thing of this approach is that at the end of the build we get in the trash exactly the things that really need to be deleted, rather than delete all the build all of the time (note that this is still more than say a `make` based build system because those overwrites paths).

Dealing with unpredictable written file paths

The system supports well tool with predictable file path writes. These tools either write to statically known file paths or allow to determine them by invoking another tool (e.g. odoc {compile,html}-targets). However the system should also be made to support tools with unpredictable file path writes because those are not uncommon.

A tool with predictable writes is invoked as follows:

Memo.spawn ~reads ~writes @@ tool cmd

That is we know the written file paths before the tool gets to execute and we indicate them in writes. This serves two purposes:

  1. If the spawn is uncached, writes indicates the ordered list of file paths whose contents (output) is captured after the operation executed and ends up being associated to the spawn's hash (input).
  2. If the spawn is cached, we use the spawn's hash (input) to retrieve the list of ordered file contents (outputs) and copy these to the given file paths, in order.

This means that in the first case we don't really need the information beforehand and we can determine it after the spawn occured, just before caching, for example by scanning the file system (a post_exec hook -- that executes after the build operation execution but before cache recording -- can be used for that).

However in the second case, when we want to revive the spawn, armed with the hash of the operation we can only retrieve file contents, but we do not know which file paths need to be bound to the contents, since the only thing the file cache stores is an association hash -> contents list. For tools with predictable file path writes, not mandating an actual path to bind to in the cache itself allows to revive build operations to different build paths (assuming their contents do not depend on their actual path location). This is used e.g. to share .cmi/.cmti file compilation between the doc and exec outcomes of brzo which happen in different build dirs or for caching in larger and larger contexts).

Now this problem could be easily solved by simply producing an operation manifest. e.g. by writing some kind of manifest file in the post_exec hook with all the files that have been produced. The logic could then become something along these lines:

Memo.read manifest @@ function
| Some writes -> Memo.spawn ~reads ~writes @@ tool cmd
| None ->
   let post_exec o = ... (* set writes of op [o] and write [manifest] *) in
   Memo.spawn ~reads ~post_exec @@ tool cmd

However this somehow assumes that manifest can be persisted across builds. That is not impossible to do but it is made difficult by the way removals are handled (clean builds), if we generate that file in the post execution hook it will be deleted when we come back later to rebuild, so we also need to either cache this file or store it in a build dir that is not deleted (but then we might lose the manifest info while the operation is still cached which is problematic). Another problem is that a name needs to be generated for the manifest file, a natural name would of course be the operation hash itself but in the above logic we need the name before the operation is specified and thus before the hash can be computed so it's a bit unfit given the way the API is currently designed.

These considerations seem to indicate that a good solution for the problem is to extend the B000.File_cache model to allow in some cases to also store the relative file paths associated to the file contents. That is we conceptually move from a file cache as:

key -> contents list

to

key -> contents list * (Fpath.t list) option

This leaves the current support for predictable writes (B000.File_cache.add, B000.File_cache.revive) unchanged but allows to add the new operations:

manifest_add :
t -> key -> string -> root:Fpath.t -> Fpath.t list -> (bool, string) result

which stores the file contents and their corresponding paths relativised w.r.t to root.

manifest_revive :
t -> key -> root:Fpath.t -> (string option * Fpath.t list, string) result

which revives the relativised file paths by making them absolute with respect to root.

It is even quite simple to devise the system so that a manifest_add can be B000.File_cache.revived. But in general these tools with unpredictable writes the actual file names hierarchy is meanigful (e.g. hyperlinked HTML files) so it may not be that useful.

In any case these two low-level operations should make it easy to add support for build operations with unpredicable writes at the higher level. Basically the support can happen at the generic B000.Op by adding a field to store an optional manifest root. If that field is specified the operation is handled using the new manifest_add or manifest_revive functions above, if not we proceed as is currently the case using B000.File_cache.revive and B000.File_cache.add.

Design answers

A few explanation about design choices we may not remember in the future.

Why arent't file permission of a read file not part of the build operation stamp ?

It is clearly something that can influence tool outputs and failures. However so can any bit of file metadata (e.g. groups) in general and a boundary needs to be drawn. For possible wide area cache sharing scenarios it also feels better not to consider, say changes in other file permissions.

Related to the answer about caching failures.

Why are failed operations not cached ?

First let's define what caching build operations would likely mean. It would mean caching it's metadata bit, i.e. for tool spawns the user interface output and the exit code. Caching the actual (likely partial) file writes would be difficult to do since the actual file system state on tool error may be difficult to characterize.

Caching failed operations may report build errors more quicky for example if a build operation is slow in erroring and nothing changed as far as its operation stamp indicates.

One problem though is is that the failure may be due to a reason that is not captured by the operation stamp. In that case even though the problem may be resolved by the user, the operation stamp would not change and the operation would be revived as a failure rather than re-executed.

To give a concrete example, suppose an operation fails because of the file permissions of a file it reads. This file permission is not part of the build operation stamp (see this answer). So changing the file permission to make the build operation succeed would not only revive the old failure, but likely also output a puzzling error message unrelated to the current state of the system. The way out for the user is long head scratching ended by a clear of the cache for that operation. That doesn't feel like a great user experience.

It seems not caching failures leads to a better user experience for b0 based build systems, at least as long as this remains true. Also build failure information should be, most of the time, available in the build logs so it may not be necessary to reinvoke a long failing operation to get its error back.