Evidence
Rectangular datasets.
Evidence
provides tools to describe and manipulate rectangular in-memory datasets.
A dataset is a sequence of observations (rows) represented by a sequence of OCaml values of a given type. Each observation can be decomposed into a set of variables (columns) via arbitrary projection functions.
This dataset organisation satisifies the principle of tidy data.
See the TODO.
module Var : sig ... end
Observation variables.
module Obs : sig ... end
Observations.
module Dataset : sig ... end
Observation sequences.
module O : sig ... end
Projective syntax.
Var
is just a named typed projection. With respect to this the O
module DSL may not be such a good idea since we switch to polycompare.na
. Add a per variable specific na
value that is not considered ? Seems footgunish, this will only be considered by DSL functions. However it's nice to have a notion of na
to suppress outliers without dropping the whole observation. Option
looks painful. On ints. We could use min_int
. Mabye not a good idea… We should also define a consistent strategy for the treatment of nan values, is it really a good idea to silently ignore them ?nan
handling/discovering.Var.type'.Ordinal
, rather have 'a -> int
and consider them as is_numeric
? Not convenient from an OCaml perspective.Var.Type
provide a case to allow to inject arbitrary arithmetic types via first class modules ?Invalid_argument
on non numeric types ?Var.Type.value_set
and Var.Type.value_map
without the polymorphic constraint on t
these are only useful locally. Maybe remove.Dataset.t
.Dataset
make optional obs
arguments required ?Dataset.Var.count
return the typed variables with the result ?Float
, Nominal
, Any
?Dataset
except for obs
, does it make sense to exist if we have an immutable sequence, e.g. Pvec
. There is the non empty invariant, what happens if we drop it ?show
do terminal plots ?Obs.set
, raise Invalid_argument
if var
is not part of the product.Ordinal
but maybe built-in a few goodies.