incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <m...@mafr.de>
Subject Re: CombineFn vs. Aggregator
Date Fri, 02 Nov 2012 16:53:11 GMT
On Friday, 2012-11-02, Josh Wills wrote:
> On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <matt@mafr.de> wrote:
[...] 
>> One thing I don't understand: Is there really a need for
>> AggregatorFactory? Am I overlooking something? It's not like we have
>> to create Aggregator instances for each key -- that's what
>> Aggregator.reset() is for.
 
> Yeah, the reason for AggregatorFactory (which I think is evident from the
> tests) is when you want to do aggregations over tuples, e.g., PTable<K,
> Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator for both
> values of the Pair instance.
 
Hmm, I'm not sure if that's something we need to prevent actively.
Users will probably write it like this:

  PGroupedTable<String, Pair<Integer, Integer>> table = /* ... */;
  table.combineValues( pairAggregator(SUM_INTS(), SUM_INTS()) );
 
And that's how I would document it on the Aggregators javadoc. Having
the factory in there makes the API a bit harder to use if you want to
write your own Aggregator. My guess is people will use this correctly,
since it's more work to do it wrong.

Anybody who really wants to shoot himself in the foot can always
return the same instance for each factory call. But maybe they even
have a good reason for it :)

Regards,
  Matthias

Mime
View raw message