Evidence.Dataset
Observation sequences.
A dataset is an non-empty, immutable, indexed sequence of observations.
The type for randomness in ranges [0
;n-1
]. The function must be such that rand n
is a uniformly distributed random number in the range [0
;n-1
].
The type for counts. A count n
must not be zero. If n
is positive it represents the first n
observations. If n
is negative it represents the last n
observations.
init ?doc n obs
initializes a dataset with n
observations drawn from obs
. obs
is used to derive formatters, serializers and for certain function defaults. Raises Invalid_argument
if n < 1
.
val length : 'o t -> int
length d
is the (positive) number of observations in d
.
val get : int -> 'o t -> 'o
get i d
is the zero-based i
th observation of d
. Raises Invalid_argument
if i
is not in [0
;length d - 1
].
val fold : (int -> 'o -> 'a -> 'a) -> 'o t -> 'a -> 'a
fold f d acc
folds f
over d
starting with acc
.
val iter : (int -> 'o -> unit) -> 'o t -> unit
iter f d
iterates f
overs d
.
map ~obs f d
maps observations of d
to obs
observations with f
. obs
defaults to Obs.empty
.
set i o d
is a new dataset with the i
th observation of d
set to o
. For efficiency do not call that function repeateadly, update
is a better option.
append d0 d1
appends the observations of d1
to those of d0
. The obs
of d0
is used for the result.
concat ds
concatenates the observations of the groups ds
. The obs
of the first element is used for the result.
group ~by d
groups observations of d
by the equivalence relation determined by by
. The sequence of groups is ordered by by
. See also Dataset.Var.group
.
sort ~by d
sorts observations of d
using ~by
. If stable
is true (defaults to false
), the order of d
is kept on ties. If take
is specified only returns take
first (> 0
) or last (< 0
) observations from the sort.
shuffle ~rand d
randomly permutes the observations d
using rand
for randomness.
val find_index : ?start:int -> (int -> 'o -> bool) -> 'o t -> int option
find_index ~start p d
is is the smallest index i
, if any, such that p i (get d i)
is true
. start
defaults to 0
and None
is returned if it is out of bounds.
sub ~first ~last ~count d
are the observations whose indices exist in the range [first
;last
]. first
defaults to 0
and last
to length d - 1
. If count
is specified only returns the count
first (count > 0
) or last (count < 0
) elements of the range.
Raises Invalid_argument
if the result is empty, that is if there is no observations in the range for d
or count = 0
.
filter p d
are the observations of d
that satisfy p
or None
if none do.
sample ~rand n d
samples n
observations from d
using rand
as a random source. Raises Invalid_argument
if n < 1
.
witness_of ~group_by
groups by group_by
and for each group keeps the lowest index observation as the group representative. The result is ordered by group_by
.
distinct d ~vars
is witness_of ~order_by:(Var.order vars)
. These are the observations that have distinct vars
variables (defaults to vars
). Keeps the observation with the lowest index as the representative. The result is ordered in the lexicographic order of vars.
Raises Invalid_argument
if vars
is empty.
module Var : sig ... end
Variable processing.
of_array ?lend ?obs a
is a dataset from obs
observations in a
. Raises Invalid_argument
if a
is empty.
If lend
is true
(default) the client lends the array to the dataset for its own use, in particular a
should never be modified.
val to_array : ?borrow:bool -> 'o t -> 'o array
to_array ?borrow d
is d
as an (non empty) array. If borrow
is true
the client should never mutate the resulting array (defaults to false
).
of_list ?obs l
is a dataset from obs
observations in l
. Raises Invalid_argument
if l
is empty.
val to_list : 'o t -> 'o list
to_list d
is d
as a (non-empty) list.
module Csv : sig ... end
Comma separated values.
val show :
?ppf:Stdlib.Format.formatter ->
?first:int ->
?last:int ->
?count:count ->
?vars:'o Var.v list ->
'o t ->
unit
show
is like pp
but outputs on ppf
(defaults to Format
.std_formatter).
module Top : sig ... end
Toplevel support.