incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: CombineFn vs. Aggregator
Date Fri, 02 Nov 2012 17:44:57 GMT
+brock explicitly

Brock, do you remember why we did AggregatorFactory? It was like a year
ago, and I can't remember if there was some additional reason for it or if
I just did the impl without thinking the problem through.

J

On Fri, Nov 2, 2012 at 9:53 AM, Matthias Friedrich <matt@mafr.de> wrote:

> On Friday, 2012-11-02, Josh Wills wrote:
> > On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <matt@mafr.de> wrote:
> [...]
> >> One thing I don't understand: Is there really a need for
> >> AggregatorFactory? Am I overlooking something? It's not like we have
> >> to create Aggregator instances for each key -- that's what
> >> Aggregator.reset() is for.
>
> > Yeah, the reason for AggregatorFactory (which I think is evident from the
> > tests) is when you want to do aggregations over tuples, e.g., PTable<K,
> > Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator for both
> > values of the Pair instance.
>
> Hmm, I'm not sure if that's something we need to prevent actively.
> Users will probably write it like this:
>
>   PGroupedTable<String, Pair<Integer, Integer>> table = /* ... */;
>   table.combineValues( pairAggregator(SUM_INTS(), SUM_INTS()) );
>
> And that's how I would document it on the Aggregators javadoc. Having
> the factory in there makes the API a bit harder to use if you want to
> write your own Aggregator. My guess is people will use this correctly,
> since it's more work to do it wrong.
>
> Anybody who really wants to shoot himself in the foot can always
> return the same instance for each factory call. But maybe they even
> have a good reason for it :)
>
> Regards,
>   Matthias
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message