Module Evidence.Dataset

Observation sequences.

A dataset is an non-empty, immutable, indexed sequence of observations.

Datasets

type rand = int -> int

The type for randomness in ranges [0;n-1]. The function must be such that rand n is a uniformly distributed random number in the range [0;n-1].

type count = int

The type for counts. A count n must not be zero. If n is positive it represents the first n observations. If n is negative it represents the last n observations.

type 'o t

The type for datasets with observations of type 'o.

val init : ?obs:'o Obs.t -> int -> (int -> 'o) -> 'o t

init ?doc n obs initializes a dataset with n observations drawn from obs. obs is used to derive formatters, serializers and for certain function defaults. Raises Invalid_argument if n < 1.

val reobs : 'o Obs.t -> 'o t -> 'o t

reobs obs d observes d with obs.

Properties

val length : 'o t -> int

length d is the (positive) number of observations in d.

val obs : 'o t -> 'o Obs.t

obs d is the observation description of d.

val vars : 'o t -> 'o Var.v list

vars d is Obs.vars (objs d), the known variables of d.

val get : int -> 'o t -> 'o

get i d is the zero-based ith observation of d. Raises Invalid_argument if i is not in [0;length d - 1].

Traversing

val fold : (int -> 'o -> 'a -> 'a) -> 'o t -> 'a -> 'a

fold f d acc folds f over d starting with acc.

val iter : (int -> 'o -> unit) -> 'o t -> unit

iter f d iterates f overs d.

Transforming

val map : ?obs:'p Obs.t -> (int -> 'o -> 'p) -> 'o t -> 'p t

map ~obs f d maps observations of d to obs observations with f. obs defaults to Obs.empty.

val filter_map : ?obs:'p Obs.t -> (int -> 'o -> 'p option) -> 'o t -> 'p t option

filter_map is like map but drops None maps. obs defaults to Obs.empty.

val update : (int -> 'o -> 'o) -> 'o t -> 'o t

update f d updates each observation o of d with f.

val set : int -> 'o -> 'o t -> 'o t

set i o d is a new dataset with the ith observation of d set to o. For efficiency do not call that function repeateadly, update is a better option.

Grouping

val append : 'o t -> 'o t -> 'o t

append d0 d1 appends the observations of d1 to those of d0. The obs of d0 is used for the result.

val concat : 'o t t -> 'o t

concat ds concatenates the observations of the groups ds. The obs of the first element is used for the result.

val group : by:('o -> 'o -> int) -> 'o t -> 'o t t

group ~by d groups observations of d by the equivalence relation determined by by. The sequence of groups is ordered by by. See also Dataset.Var.group.

Ordering

val sort : ?take:count -> ?stable:bool -> by:('o -> 'o -> int) -> 'o t -> 'o t

sort ~by d sorts observations of d using ~by. If stable is true (defaults to false), the order of d is kept on ties. If take is specified only returns take first (> 0) or last (< 0) observations from the sort.

val shuffle : rand:rand -> 'o t -> 'o t

shuffle ~rand d randomly permutes the observations d using rand for randomness.

val rev : 'o t -> 'o t

rev o reverses the order of observations.

Extracting

val find_index : ?start:int -> (int -> 'o -> bool) -> 'o t -> int option

find_index ~start p d is is the smallest index i, if any, such that p i (get d i) is true. start defaults to 0 and None is returned if it is out of bounds.

val sub : ?first:int -> ?last:int -> ?count:count -> 'o t -> 'o t

sub ~first ~last ~count d are the observations whose indices exist in the range [first;last]. first defaults to 0 and last to length d - 1. If count is specified only returns the count first (count > 0) or last (count < 0) elements of the range.

Raises Invalid_argument if the result is empty, that is if there is no observations in the range for d or count = 0.

val filter : (int -> 'o -> bool) -> 'o t -> 'o t option

filter p d are the observations of d that satisfy p or None if none do.

val sample : rand:rand -> int -> 'o t -> 'o t

sample ~rand n d samples n observations from d using rand as a random source. Raises Invalid_argument if n < 1.

val witness_of : group_by:('o -> 'o -> int) -> 'o t -> 'o t

witness_of ~group_by groups by group_by and for each group keeps the lowest index observation as the group representative. The result is ordered by group_by.

val distinct : ?vars:'o Var.v list -> 'o t -> 'o t

distinct d ~vars is witness_of ~order_by:(Var.order vars). These are the observations that have distinct vars variables (defaults to vars). Keeps the observation with the lowest index as the representative. The result is ordered in the lexicographic order of vars.

Raises Invalid_argument if vars is empty.

Variable processing

module Var : sig ... end

Variable processing.

Converting

val of_array : ?lend:bool -> ?obs:'o Obs.t -> 'o array -> 'o t

of_array ?lend ?obs a is a dataset from obs observations in a. Raises Invalid_argument if a is empty.

If lend is true (default) the client lends the array to the dataset for its own use, in particular a should never be modified.

val to_array : ?borrow:bool -> 'o t -> 'o array

to_array ?borrow d is d as an (non empty) array. If borrow is true the client should never mutate the resulting array (defaults to false).

val of_list : ?obs:'o Obs.t -> 'o list -> 'o t

of_list ?obs l is a dataset from obs observations in l. Raises Invalid_argument if l is empty.

val to_list : 'o t -> 'o list

to_list d is d as a (non-empty) list.

val of_t1 : 'a Var.type' -> 'a list -> ('a, 'a) Var.t * 'a t
val of_t2 : 'a Var.type' -> 'b Var.type' -> ('a * 'b) list -> (('a * 'b, 'a) Var.t * ('a * 'b, 'b) Var.t) * ('a * 'b) t
module Csv : sig ... end

Comma separated values.

Formatting

val pp : ?first:int -> ?last:int -> ?count:count -> ?vars:'o Var.v list -> unit -> 'o t Var.fmt

pp ~header ~vars () is a dataset formatter. If vars is specified only these variables are printed (defaults to vars). first, last, count have the semantics of sub, except empty ranges are allowed.

val pp_top : 'o t Var.fmt

pp_top is a formatter for the toplevel. This is pp ~count:10 ().

val show : ?ppf:Stdlib.Format.formatter -> ?first:int -> ?last:int -> ?count:count -> ?vars:'o Var.v list -> 'o t -> unit

show is like pp but outputs on ppf (defaults to Format.std_formatter).

Toplevel support

module Top : sig ... end

Toplevel support.