Module Dataset.Var

Variable processing.

Note. Unless otherwise specified Var.Float variables treat nans are missing values. nan values can still be returned, for example if that's the single value available.

Summarize

val count : ('o, 'a) Var.t -> 'o t -> ('a * int) t

count var d groups observations by unique values of variable var and reports the number of observations found in each group. Variables values are sorted in increasing order, the dataset can be accessed with Var.for_var_count var. See also group.

val sum : ('o, 'a) Var.t -> 'o t -> float

sum var d is the sum of the values of var in d. nan values are excluded from the computation. This is nan on non-numeric types or if there are only nans. On floats uses the Kahan-Babuška algorithm (§3).

val mean : ('o, 'a) Var.t -> 'o t -> float

mean var d is the arithmetic mean of variable var in d. nan values are excluded from the computation. This is nan on non-numeric types or if there are only nans. Uses sum to compute the result.

val quantile : ('o, 'a) Var.t -> 'o t -> float -> float

quantile var d is a function quant such that quant p is the p-quantile of d on variable var using the R-7 definition. quant clamps its argument to [0;1]. nan values are excluded from the computation. The function is Fun.const nan on non-numeric types or if there only nans.

val median : ('o, 'a) Var.t -> 'o t -> float

median var d is quantile var d 0.5. If you also need other quantiles, use the function returned by quantile var d.

val variance : ('o, 'a) Var.t -> 'o t -> float

variance var d is the unbiased sample variance of variable var in d computed using Welford's algorithm. This is nan on non-numeric types or if there are only nans or less than two numbers.

val deviation : ('o, 'a) Var.t -> 'o t -> float

deviation var d is sqrt (deviation var d), the standard deviation of variable var in d.

Grouping

val group : by:('o, 'a) Var.t -> 'o t -> ('a * 'o t) t

group ~by:var d groups observations of d by the equivalence relation determined by variable by. The sequence of groups is ordered by Var.compare_value by.

Range

val min : ('o, 'a) Var.t -> 'o t -> 'a

min var d is the minimal value of var in d as determined by Evidence.Var.min_value.

val max : ('o, 'a) Var.t -> 'o t -> 'a

min var d is the maximal value of var in d as determined by Evidence.Var.max_value.

val min_max : ('o, 'a) Var.t -> 'o t -> 'a * 'a

min_max var d is (min var d, max var d) but more efficient.

val values : ('o, 'a) Var.t -> 'o t -> 'a t

value var d is the unique values found in var sorted by increasing Var.compare_value order.

val dom : (module Stdlib.Set.S with type elt = 'a and type t = 'set) -> ('o, 'a) Var.t -> 'o t -> 'set

dom is values but as a set.

Transforming

val update : ('o, 'a) Var.t -> (int -> 'o -> 'a) -> 'o t -> 'o t

update var f d updates variable var of each observation of d with f. Note that this fails on observations with absurd products.

val set : ('o, 'a) Var.t -> 'a -> int -> 'o t -> 'o t

set var v i d sets variable var of the ith observation of d to v. For efficiency do not call that function repeateadly, update is a better option.

val cumsum : ('o, 'a) Var.t -> 'o t -> float t

cumsum var d is the cumulative sum of variable var.