This approach is fine for relatively wellbehaved distributions. Anything
more skewed than, say, an exponential or as long tailed as a t(3)
distribution is likely to have troubles with this approach.
See
http://searchlucene.com/jd/mahout/math/org/apache/mahout/math/stats/OnlineSummarizer.htmlfor
the alternative I have been suggesting. It can keep accurate
estimates
of any quantile that you like.
On Mon, Mar 14, 2011 at 5:17 PM, sebb <sebbaz@gmail.com> wrote:
>
> In JMeter we needed to display long running percentiles without using
> excess memory, and someone came up with the idea of using buckets for
> ranges of values. So instead of keeping details on each sample elapsed
> time, we increment the count for the appropriate bucket.
>
> If the range of values is too large to use a single bucket for each
> value, each bucket can represent a range of values.
> These ranges can potentially be nonuniform though that does
> complicate the calculations.
>
> JMeter actually uses a TreeMap for the values and counts  the values
> need to be sorted in order to calculate percentiles.
>
> Depending on the dataset, it might be possible to used fixed arrays
> instead of the TreeMap.
