EvidenceRectangular datasets.
Evidence provides tools to describe and manipulate rectangular in-memory datasets.
A dataset is a sequence of observations (rows) represented by a sequence of OCaml values of a given type. Each observation can be decomposed into a set of variables (columns) via arbitrary projection functions.
This dataset organisation satisifies the principle of tidy data.
See the TODO.
module Var : sig ... endObservation variables.
module Obs : sig ... endObservations.
module Dataset : sig ... endObservation sequences.
module O : sig ... endProjective syntax.
Var is just a named typed projection. With respect to this the O module DSL may not be such a good idea since we switch to polycompare.na. Add a per variable specific na value that is not considered ? Seems footgunish, this will only be considered by DSL functions. However it's nice to have a notion of na to suppress outliers without dropping the whole observation. Option looks painful. On ints. We could use min_int. Mabye not a good idea… We should also define a consistent strategy for the treatment of nan values, is it really a good idea to silently ignore them ?nan handling/discovering.Var.type'.Ordinal, rather have 'a -> int and consider them as is_numeric ? Not convenient from an OCaml perspective.Var.Type provide a case to allow to inject arbitrary arithmetic types via first class modules ?Invalid_argument on non numeric types ?Var.Type.value_set and Var.Type.value_map without the polymorphic constraint on t these are only useful locally. Maybe remove.Dataset.t.Dataset make optional obs arguments required ?Dataset.Var.count return the typed variables with the result ?Float, Nominal, Any ?Dataset except for obs, does it make sense to exist if we have an immutable sequence, e.g. Pvec. There is the non empty invariant, what happens if we drop it ?show do terminal plots ?Obs.set, raise Invalid_argument if var is not part of the product.Ordinal but maybe built-in a few goodies.