crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinal Shah <jinalshah2...@gmail.com>
Subject Re: Generic class for converting PCollection to PTable
Date Thu, 20 Feb 2014 15:06:42 GMT
Thanks Gabriel that works. Just curious what's the benefit of using
PCollection.by as oppose to PCollection.parallelDo ? In which use case is
either better than the other.


On Thu, Feb 20, 2014 at 6:01 AM, Gabriel Reid <gabriel.reid@gmail.com>wrote:

> On Thu, Feb 20, 2014 at 11:59 AM, Jinal Shah <jinalshah2007@gmail.com>
> wrote:
> > Somewhat like that as we are also using that same approach but I was more
> > thinking of it as
> > PTables.asPTable(PCollection<V>, Keyfinder<V>, PType<K>) and return
as
> > PTable<K,V>
> >
> > Basically
> > KeyFinder<V> is an interface which will have somekind of method like
> > findKey(V) returning K from that V or calculated or anyway it wants.
> >
>
> This is pretty much exactly what PCollection#by does. Your proposed
> method as you described it would be written as follows using
> PCollection#by:
>
>     PCollection<V> collection = ...;
>     PTable<K, V> table = collection.by(new KeyFinderMapFn(), ptypeForKey);
>
>
> The method is described at
>
> http://crunch.apache.org/apidocs/0.8.2/org/apache/crunch/PCollection.html#by(org.apache.crunch.MapFn,%20org.apache.crunch.types.PType)
>
> - Gabriel
>
>
>
> >
> >
> > On Thu, Feb 20, 2014 at 12:07 AM, Gabriel Reid <gabriel.reid@gmail.com
> >wrote:
> >
> >>
> >>
> >> > On 20 Feb 2014, at 05:11, Jinal Shah <jinalshah2007@gmail.com> wrote:
> >> >
> >> > I didn't knew that, but I was more talking about something like this
> >> > PCollection<V> to  PTable<K,V> basically.
> >> >
> >>
> >> I think what you want is the PCollection#by method. It takes a MapFn
> that
> >> maps each value V to a key, and returns a PTable<K,V>
> >>
> >> - Gabriel
> >>
> >> >
> >> >
> >> >> On Wed, Feb 19, 2014 at 5:49 PM, Josh Wills <jwills@cloudera.com>
> >> wrote:
> >> >>
> >> >> org.apache.crunch.lib.PTables.asPTable is likely what you want.
> >> >>
> >> >>
> >> >> On Wed, Feb 19, 2014 at 3:47 PM, Jinal Shah <jinalshah2007@gmail.com
> >
> >> >> wrote:
> >> >>
> >> >>> Hi everyone,
> >> >>>
> >> >>> Is there a generic way of converting PCollection to PTable? If
not,
> Can
> >> >> we
> >> >>> create a generic class? Because we are having lot of places where
we
> >> want
> >> >>> to perform a join on 2 PCollections so we have to convert it into
> >> PTables
> >> >>> and then do a join and then convert it into a PCollection. So i
was
> >> >>> wondering is there a better way of doing this.
> >> >>>
> >> >>> Thanks
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Director of Data Science
> >> >> Cloudera <http://www.cloudera.com>
> >> >> Twitter: @josh_wills <http://twitter.com/josh_wills>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message