Dataset.Var
Variable processing.
Note. Unless otherwise specified Var
.Float variables treat nan
s are missing values. nan
values can still be returned, for example if that's the single value available.
count var d
groups observations by unique values of variable var
and reports the number of observations found in each group. Variables values are sorted in increasing order, the dataset can be accessed with Var
.for_var_count var
. See also group
.
sum var d
is the sum of the values of var
in d
. nan
values are excluded from the computation. This is nan
on non-numeric types or if there are only nan
s. On floats uses the Kahan-Babuška algorithm (§3).
mean var d
is the arithmetic mean of variable var
in d
. nan
values are excluded from the computation. This is nan
on non-numeric types or if there are only nan
s. Uses sum
to compute the result.
quantile var d
is a function quant
such that quant p
is the p
-quantile of d
on variable var
using the R-7 definition. quant
clamps its argument to [0
;1
]. nan
values are excluded from the computation. The function is Fun.const nan
on non-numeric types or if there only nan
s.
median var d
is quantile var d 0.5
. If you also need other quantiles, use the function returned by quantile var d
.
variance var d
is the unbiased sample variance of variable var
in d
computed using Welford's algorithm. This is nan
on non-numeric types or if there are only nan
s or less than two numbers.
deviation var d
is sqrt (deviation var d)
, the standard deviation of variable var
in d
.
group ~by:var d
groups observations of d
by the equivalence relation determined by variable by
. The sequence of groups is ordered by Var.compare_value by
.
min var d
is the minimal value of var
in d
as determined by Evidence.Var.min_value
.
min var d
is the maximal value of var
in d
as determined by Evidence.Var.max_value
.
min_max var d
is (min var d, max var d)
but more efficient.
value var d
is the unique values found in var
sorted by increasing Var
.compare_value order.
val dom :
(module Stdlib.Set.S with type elt = 'a and type t = 'set) ->
('o, 'a) Var.t ->
'o t ->
'set
dom
is values
but as a set.
update var f d
updates variable var
of each observation of d
with f
. Note that this fails on observations with absurd products.
set var v i d
sets variable var
of the i
th observation of d
to v
. For efficiency do not call that function repeateadly, update
is a better option.
cumsum var d
is the cumulative sum of variable var
.