incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <>
Subject CombineFn vs. Aggregator
Date Fri, 02 Nov 2012 10:13:17 GMT

in our last discussion on refactoring we agreed (I think) to clean
up CombineFn and move implementations to .fn.CombineFns. While playing
with the API, I noticed that it's harder than necessary to get your
generics in line. Example:

  PGroupedTable<String, Integer> table = /* ... */;

You need to specify the String type or you get a compilation error.
This leaks an implementation detail that isn't obvious to users; you
need to know how CombineFn works to know that you have to specify the
key type. And it's just plain ugly.

How about using Aggregator more prominently by adding a
combineValues(Aggregator<V> a) method to PGroupedTable? We could
simplify the code and make it easier to understand:

  PGroupedTable<String, Integer> table = /* ... */;

Writing custom aggregations would be easier, too, as you don't need to
decorate your aggregator with AggregatorCombineFn yourself anymore. We
could go even further and remove combineValues(CombineFn), because it
doesn't add much value anymore. If you need maximum flexibility (that
is, your aggregation depends on the table's key somehow), you can
still use parallelDo() with CombineFn directly. I don't think that's
a common use case though.

Here's what I propose:

  1) Leave CombineFn and Aggregator in base
  2) Remove all static factories (SUM_INTS etc.) from CombineFn
  3) Create an Aggregators class in .fn with static factories
     for everything removed from CombineFn
  4) Make the implementation classes private, they just take up
     lots of space in javaodcs
  5) Bonus: Remove combineValues(CombineFn)

One thing I don't understand: Is there really a need for
AggregatorFactory? Am I overlooking something? It's not like we have
to create Aggregator instances for each key -- that's what
Aggregator.reset() is for.


View raw message