accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <billie.rina...@gmail.com>
Subject Re: Calculating averages with eg. StatsCombiner
Date Tue, 15 Jul 2014 18:44:23 GMT
Yes, any individual scan should be able to calculate an accurate average
based on the entries present at the time of the scan.  You just can't
pre-compute an average, but you can pre-compute the sum and count and do
the division on the fly.  For averaging, finishing up the calculation is
trivial, but it is a simple example of a reducer that loses information
when calculating its result: there is no function f(avg(v_0, ... ,v_N),
v_new) that equals avg(v_0, ... ,v_N, v_new) when you don't know N.  You
would not want a combiner that loses information to run during major or
minor compaction scopes.


On Fri, Jul 11, 2014 at 12:38 AM, Russ Weeks <rweeks@newbrightidea.com>
wrote:

> Hi,
>
> I'd like to understand this paragraph in the Accumulo manual a little
> better:
>
> "The only restriction on an combining iterator is that the combiner
> developer should not assume that all values for a given key have been seen,
> since new mutations can be inserted at anytime. This precludes using the
> total number of values in the aggregation such as when calculating an
> average, for example."
>
> By "using the total number of values in the aggregation", I presume that
> it means inside the combiner's reduce method? Because it seems like if I'm
> using the example StatsCombiner registered on all 3 scopes, after the scan
> completes the count and the sum fields should be consistent (w.r.t each
> other, of course new mutations could have been added since the scan
> started) and if I divide the two I'll get an accurate average, right?
>
> Thanks,
> -Russ
>

Mime
View raw message