[ https://issues.apache.org/jira/browse/MATH418?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13944470#comment13944470
]
Ted Dunning commented on MATH418:

The tdigest is also available.
You can read more at
https://github.com/tdunning/tdigest/blob/master/docs/tdigestpaper/histo.pdf?raw=true
This paper contains accuracy comparisons against the DataFu StreamingQuantile implementation
of GK01 and against the Qdigest algorithm. Tdigest dominates both in terms of space and
accuracy. I haven't tested for speed yet since the tdigest implementation is improving rapidly.
Tdigest is also a very simple algorithm conceptually which helps with accurate implementation.
Tdigest is particularly good at maintaining accuracy for extreme quantiles and for highly
skewed distribution. GK01 does well on skewed distributions, but maintains constant absolute
accuracy rather than relative accuracy. Qdigest depends on integer representations and thus
fails badly on sufficiently skewed distributions.
The library involved is available via maven and is apache licensed. Apache Commons Math has
a "no dependency" policy which might mean that sucking in the code would be a better option
than simply linking to this.
The standard of the art before tdigest is generally considered to be either Greenwald and
Khanna's algorithm GK01 (what DataFu uses) or the Qdigest (what streamlib used before they
adopted tdigest). References are in the paper above.
In case it isn't obvious, source code is available on github at
https://github.com/tdunning/tdigest
The p^2 algorithm is actually quite old and is far from the state of the art.
> add a storeless version of Percentile
> 
>
> Key: MATH418
> URL: https://issues.apache.org/jira/browse/MATH418
> Project: Commons Math
> Issue Type: New Feature
> Affects Versions: 2.1
> Reporter: Luc Maisonobe
> Fix For: 4.0
>
>
> The Percentile class can handle only inmemory data.
> It would be interesting to use an online algorithm to estimate quantiles as a storeless
statistic.
> An example of such an algorithm is the exponentially weighted stochastic approximation
described in a 2000 paper by Fei Chen , Diane Lambert and JosÃ© C. Pinheiro "Incremental
Quantile Estimation for Massive Tracking" which can be retrieved from CiteSeerX at [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.1580].

This message was sent by Atlassian JIRA
(v6.2#6252)
