mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1361) Online algorithm for computing accurate Quantiles using 1-D clustering
Date Thu, 21 Nov 2013 05:07:36 GMT


Ted Dunning commented on MAHOUT-1361:


I can't comment specifically (which is part of why I implemented this).  My impression is
that Q-digests dominate count min sketches in terms of memory / accuracy trade-off, but I
am not at all sure.  I am absolutely certain that t-digests are better than Q-digests in terms
of memory / accuracy for extreme quantiles since the errors will be measured in a few ppm
instead of a few percent.  If you used a flat centroid size limit to compare against Q-digests,
the memory consumption should be quite similar.  In any case, t-digests are much simpler conceptually
and do not have the difficulty that Q-digests have that the proof of accuracy is based on
the batch mode rather than the on-line case.

> Online algorithm for computing accurate Quantiles using 1-D clustering
> ----------------------------------------------------------------------
>                 Key: MAHOUT-1361
>                 URL:
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.9
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>             Fix For: 0.9
>         Attachments: MAHOUT-1361.patch
> Implementation of Ted Dunning's paper and initial work on this subject. See
for the paper.
> An on-line algorithm for computing approximations of rank-based statistics that allows
controllable accuracy. This algorithm can also be used to compute hybrid statistics such as
trimmed means in addition to computing arbitrary quantiles.

This message was sent by Atlassian JIRA

View raw message