commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <>
Subject [jira] [Commented] (MATH-418) add a storeless version of Percentile
Date Sun, 23 Mar 2014 16:38:45 GMT


Ted Dunning commented on MATH-418:

The t-digest is also available.  

You can

This paper contains accuracy comparisons against the DataFu StreamingQuantile implementation
of GK01 and against the Q-digest algorithm.  T-digest dominates both in terms of space and
accuracy.  I haven't tested for speed yet since the t-digest implementation is improving rapidly.
 T-digest is also a very simple algorithm conceptually which helps with accurate implementation.
 T-digest is particularly good at maintaining accuracy for extreme quantiles and for highly
skewed distribution.  GK01 does well on skewed distributions, but maintains constant absolute
accuracy rather than relative accuracy.  Q-digest depends on integer representations and thus
fails badly on sufficiently skewed distributions.  

The library involved is available via maven and is apache licensed.  Apache Commons Math has
a "no dependency" policy which might mean that sucking in the code would be a better option
than simply linking to this.

The standard of the art before t-digest is generally considered to be either Greenwald and
Khanna's algorithm GK01 (what DataFu uses) or the Q-digest (what stream-lib used before they
adopted t-digest).  References are in the paper above.

In case it isn't obvious, source code is available on github at

The p^2 algorithm is actually quite old and is far from the state of the art.

> add a storeless version of Percentile
> -------------------------------------
>                 Key: MATH-418
>                 URL:
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 2.1
>            Reporter: Luc Maisonobe
>             Fix For: 4.0
> The Percentile class can handle only in-memory data.
> It would be interesting to use an on-line algorithm to estimate quantiles as a storeless
> An example of such an algorithm is the exponentially weighted stochastic approximation
 described in a 2000 paper by Fei Chen ,  Diane Lambert  and José C. Pinheiro "Incremental
Quantile Estimation for Massive Tracking" which can be retrieved from CiteSeerX at [].

This message was sent by Atlassian JIRA

View raw message