commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics
Date Tue, 14 Oct 2014 04:17:22 GMT
On 10/13/14 8:55 PM, venkatesha murthy wrote:
> On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <> wrote:
>> On 10/13/14 1:04 PM, venkatesha murthy wrote:
>>> Adding a bit more on this:
>>> a) The DescriptiveStatisticalSummary actually handles the rest of the
>>> functions such as addValue, getPercentile etc.
>>> b) I have added addValue() as it is important to see either storeless or
>>> store variants as interfaces.
>>> c) A case in point being (for b); i was actually trying out a lockfull
>> and
>>> a lockfree based variants for descriptive statistical summary and it was
>>> very concise/consistent with an interface to use that has all common
>>> functions across all variants.
>>> d) well lock based or lock free variants are not a part of this patch as
>>> iam still working through
>>> However i feel the getPercentile can definitely add value. Please let me
>>> know if i could turn in all the relevant methods of
>>> DescriptiveStorelessStatistics  into statistical summary (such as
>> kurtosis,
>>> skewness etc..) and then we could just use SummaryStatistics.
>> I am not sure I understand what you are proposing.  Currently, we
>> have two statistical "aggregates" for descriptive univariate stats:
>> SummaryStatistics - aggregates "storeless" statistics over a stream
>> of data that is not stored in memory
>> DescriptiveStatistics - provides an extended set of statistics, some
>> of which require that the full set of data be stored in memory
>> OK. I am sorry for the confusion here. I understand the intent now.
> However what i wanted to convey was all the statistics that
> is supported in current DescriptiveStatistics can be supported in Storeless
> variant as well. (For eg: skewness, kurtosis, percentile)

No, for example exact percentiles, or even arbitrary percentiles
(without the quantile - e.g. quartile) specified in advance, can't
be computed without storing the data.  Also, DescriptiveStatistics
supports a rolling window and stats it implements can make use of
multi-pass algorithms. 

> Therefore; what i was proposing is to have a common interface that can have
> all these methods too. for eg: (we can change the name if it is needed)
> DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
> StatisticalSummary{
>      getKurtosis();
>      getPercentile();
>      getSkewness();
>      // Add Mutation methods as well
>      addValue(double d);
>      //Provide additional builder methods for injecting custom percentile,
> kurtosis, skewness, variance etc.
>      withPercentile(S Percentile);
>      withKurtosis(S kurtosis);
> }

Per comments above, the contracts of these aggregates are
different.  We have also moved away from defining abstract
interfaces as these end up creating problems when we want to add
things (as in the subject of this thread).

>> The subject of this thread was a proposal to add quartiles to
>> SummaryStatistics, as the new(ish) PSquarePercentile allows those
>> statistics to be computed without storing the data.
>> Agreed. I was just adding points on how we can bring both
> DescriptiveStatistics and SummaryStatistics under a common interface for
> all the stats.
>> Phil
>>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
>>>> wrote:
>>>> Hi Phil,
>>>> Though i did not add to StatisticalSummary i was actually working on a
>>>> DescriptiveStatisticalSummary for all the Storeless variants inclusive
>> of
>>>> PSquarePercentile. Would it help if you can actually implement
>>>> SummaryStatisitcs with an extended interface such as
>>>> DescriptiveStatisticalSummary ? below.
>>>> That said i actually wanted to discuss the new storelessvariant of
>>>> descriptive statistics.
>>>> a) DescriptiveStatisticalSummary - an extended interface for
>>>> StatisticalSummary (adds a Generic type that can cater for store full
>> and
>>>> storeless)
>>>> b) DescriptiveStorelessStatistics - Storeless variant of
>>>> DescriptiveStatisitcs
>>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized wrapper.
>>>> Test case classes added to the same.
>>>> Please let me know on this i could also accomodate the changes to
>> summary
>>>> stats based on this change here.
>>>> Also please let me know if this could be raised as a jira ticket to
>> pursue.
>>>> Thanks
>>>> Murthy
>>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <>
>>>> wrote:
>>>>> Now that we have a "storeless" percentile estimator, we can add
>>>>> quartile computation to SummaryStatistics.  Any objections to my
>>>>> adding this?  I could optionally add a boolean constructor argument
>>>>> to avoid the overhead of maintaining these stats.  Or more
>>>>> generally, add a bitfield encoding the exact set of stats the user
>>>>> wants to maintain.  If there are no objections to the addition, I
>>>>> will open a JIRA.
>>>>> Phil
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
>>>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message