incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: CombineFn vs. Aggregator
Date Fri, 02 Nov 2012 18:45:02 GMT
On Fri, Nov 2, 2012 at 11:42 AM, Brock Noland <brock@cloudera.com> wrote:

> Like a bear awoken from a slumber...inline.
>
> On Fri, Nov 2, 2012 at 12:44 PM, Josh Wills <jwills@cloudera.com> wrote:
> > +brock explicitly
> >
> > Brock, do you remember why we did AggregatorFactory? It was like a year
> ago,
> > and I can't remember if there was some additional reason for it or if I
> just
> > did the impl without thinking the problem through.
> >
> > J
> >
> > On Fri, Nov 2, 2012 at 9:53 AM, Matthias Friedrich <matt@mafr.de> wrote:
> >>
> >> On Friday, 2012-11-02, Josh Wills wrote:
> >> > On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <matt@mafr.de>
> wrote:
> >> > Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator
for
> >> > both
> >> > values of the Pair instance.
> >>
> >> Hmm, I'm not sure if that's something we need to prevent actively.
> >> Users will probably write it like this:
> >>
> >>   PGroupedTable<String, Pair<Integer, Integer>> table = /* ... */;
> >>   table.combineValues( pairAggregator(SUM_INTS(), SUM_INTS()) );
> >>
>
> Without the factory or a similar concept, this wouldn't work, right?
> Meaning that if all we did was eliminate the factory, then there would
> be a single SumInts object ClassLoader wide. Has the implementation
> changed so much my comments are out of date?
>

I think Matthias is saying to sub out the objects w/classes and just return
a new instance of Aggregator each time this is called. It feels like the
obvious thing to do; and yet, we didn't do it, and I'm not sure why.


> Brock
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message