crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandan Biswas <cbiswas1...@gmail.com>
Subject Re: Process of CombineFn<S,T> returns <S,U>?
Date Thu, 17 Oct 2013 21:30:09 GMT
Hello Micah,
Yes we are using MapFn now. That aggregation and computation is being done
in reduce phase. As CombineFn after GBK runs into map side, then those most
computations can be done in map side which are now running in reduce phase.
Some smaller aggregations and computations can be done on reduce phase.
My point was to do some aggregation (and create a new object) in map phase
instead of in reduce phase.

Thanks,
Chandan


On Thu, Oct 17, 2013 at 3:48 PM, Micah Whitacre <mkwhit@gmail.com> wrote:

> Chandan,
>    I think what you are wanting will just be a simple MapFn instead of a
> CombineFn.  The doc of the CombineFn[1] sounds like what you want with the
> statement "A special
> DoFn<http://crunch.apache.org/apidocs/0.7.0/org/apache/crunch/DoFn.html>
> implementation
> that converts an
> Iterable<
> http://download.oracle.com/javase/6/docs/api/java/lang/Iterable.html?is-external=true
> >
> of
> values into a single value" but it is expecting the value to be of the same
> time.  Since you are wanting to combine the values into a different form it
> should be fairly trivial to write a MapFn that converts the Iterable<T> ->
> U.
>
> [1] -
> http://crunch.apache.org/apidocs/0.7.0/org/apache/crunch/CombineFn.html
>
>
> On Thu, Oct 17, 2013 at 3:30 PM, Chandan Biswas <cbiswas1983@gmail.com
> >wrote:
>
> > I was trying to refactoring some stuffs and trying to use combineFn.
> > But when I went into deeper, found that I can't do it as Crunch doesn't
> > allow it the functionality I needed. For example, I have a
> > PGroupedTable<S,T>. I wanted to apply CombineFn<S,T> on it and wanted
to
> > get PCollection<S,U> instead of T. Right now, CombineFn allows only same
> > type as return value. The use case of this need is that there will be
> some
> > time saving in sorting. It's natural that when aggregating some objects
> at
> > map side can create a new different type object.
> >
> > Any thought on it? Am I missing any thing? If this can be written in
> > different way using existing way please let me know.
> >
> > Thanks
> > Chandan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message