accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: Calculating averages with eg. StatsCombiner
Date Tue, 15 Jul 2014 19:29:38 GMT
Thanks, Billie, that clears things up.
-Russ


On Tue, Jul 15, 2014 at 11:44 AM, Billie Rinaldi <billie.rinaldi@gmail.com>
wrote:

> Yes, any individual scan should be able to calculate an accurate average
> based on the entries present at the time of the scan.  You just can't
> pre-compute an average, but you can pre-compute the sum and count and do
> the division on the fly.  For averaging, finishing up the calculation is
> trivial, but it is a simple example of a reducer that loses information
> when calculating its result: there is no function f(avg(v_0, ... ,v_N),
> v_new) that equals avg(v_0, ... ,v_N, v_new) when you don't know N.  You
> would not want a combiner that loses information to run during major or
> minor compaction scopes.
>
>
> On Fri, Jul 11, 2014 at 12:38 AM, Russ Weeks <rweeks@newbrightidea.com>
> wrote:
>
>> Hi,
>>
>> I'd like to understand this paragraph in the Accumulo manual a little
>> better:
>>
>> "The only restriction on an combining iterator is that the combiner
>> developer should not assume that all values for a given key have been seen,
>> since new mutations can be inserted at anytime. This precludes using the
>> total number of values in the aggregation such as when calculating an
>> average, for example."
>>
>> By "using the total number of values in the aggregation", I presume that
>> it means inside the combiner's reduce method? Because it seems like if I'm
>> using the example StatsCombiner registered on all 3 scopes, after the scan
>> completes the count and the sum fields should be consistent (w.r.t each
>> other, of course new mutations could have been added since the scan
>> started) and if I divide the two I'll get an accurate average, right?
>>
>> Thanks,
>> -Russ
>>
>
>

Mime
View raw message