The design tries to enforce well-formed streams as a sequence of non-empty slices terminated by Bytesrw.Bytes.Slice.eod
. We want to ensure that read
and write
functions of stream readers and writers are not called with unexpected empty slices or that they are called after they returned or were called with Bytesrw.Bytes.Slice.eod
.
For example the Bytesrw.Bytes.Slice
creation functions raise Invalid_argument
when they unexpectedly produce an empty slice and those that can produce them have dedicated names that include _eod
to make users and code readers aware that this might terminate the stream.
Also readers and writers internally overwrite their read
and write
function as soon as Bytesrw.Bytes.Slice.eod
is seen so that no closure are kept longer than it should be.
The error strategy may look involved but it caters for:
Bytesrw_zstd
where a single case is used for all errors.while retaining the ability to pattern match on the error of specific streams and making sure informative human error messages can always be produced.
For users it means that when they use streams:
To enforce good behaviour by stream reader and writers the following type structure is used:
Bytesrw.Bytes.Stream.error
types to which stream formats add they own case.Bytesrw.Bytes.Stream.Error
exception which holds a value of type error
cannot be directly raised as it demands the creation of a Bytesrw.Bytes.Stream.error_context
value for which there is no constructor.Bytesrw.Bytes.Stream.error
, or Bytesrw.Bytes.Reader.error
or Bytesrw.Bytes.Writer.error
. These functions ask for a Bytesrw.Bytes.Stream.format_error
, whose creation asks stream formats to identify themelves and describe how their errors can be stringified.Effectively this system can be seen as an OCaml-like exception system inside the Bytesrw.Bytes.Stream.Error
exception.
read/write
primitiveswrite
return the number of written bytes from the slice would be useful for certain scenario, most notably write filters. It would be the moral equivalent of readers' push backs, except in this case it's inconvenient for the client and leads to really_write
… not desirable.read
. We do not provide an interface for specifying the number of bytes to read. That would entail storing the last slice in the reader. But then if too much is requested then either we need to start buffering internally because of slice validity or potentially return less but then this leads to really_read
and buffering again. The example of utf-8 text codec shows that we can really get rid of buffering except for a tiny buffer for overlapping reads. It seems better to let higher level abstractions handle that.length
s rather than size
s because that's what the Stdlib
generally uses.first
, last
and length
for slice indexing and pos
, for streams. That way the terminology does not overlap which makes it for a clearer exposition. Incidentally our usage of pos
is consistent with the Stdlib
(though the index vs position terminology is unfortunate in my opinion).If we determine that this API works well we should consider proposing to upstream the extension of Bytesrw.Bytes
. The core code is mostly trivial it's "just" three data structures. But added code is more code to maintain.
However here are a few things to consider:
~length
and ~slice_length
, the Stdlib uses ~len
. A counter here is to mention that this corresponds to the name of the fields of the structures and there is little point to multiply the names for the same thing (and using Bytes.Slice.len
would be weird and inconsistent with {String,Bytes}.length
).first
+ length
for specifying ranges a few functions are also provided with the (extremely nice I have to say) inclusive ?first
, ?last
range mechanism which was was percieved negatively by upstream for strings. For wrong reasons in my opinion revolving around aesthetics of +/-1
(counter 1: clarity; counter 2: depends on your use case, you then also need to add +/-1) and provability (?). It's a pity since it's a very usable interface that avoids a lot of footgunish index computations and brings general code reading clarity by requiring less mental labor. It's also consistent with OCaml's own inclusive for
loop and certainly not an anomaly in the language design space. See the also the discussion here.Dependencies. A few things will need to be dispatched differently module wise.
Basically the channel stuff should move to the {In,Out}channel
modules and the buffer stuff to the Buffer
module. The formatters won't make it or will end up in the unusably long-winded Format
.
Besides that I don't think any functionality would be lost, in particular we made sure not to use Format
in the Bytesrw.Bytes.Stream.error
interface.
One thing I can see not make it is perhaps the nice hex dump formatter Bytesrw.Bytes.pp_hex
which is maybe too much code for upstream to take.
Bytesrw.Bytes.Reader.of_string
. Upstream will likely want no unsafe usage. These places can be found with git grep 'Unsafe is
ok'
.