incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-58) Implement PObject in Crunch/Scrunch
Date Thu, 06 Sep 2012 19:47:07 GMT


Kiyan Ahmadizadeh commented on CRUNCH-58:

Discussion of implementing PObjects started on the CRUNCH-57 ticket.  Josh gave this suggestion
for an implementation: 

Kiyan, do you have an opinion on how you want to go about this one? Do you want to take on
defining PObject (which in my mind, could just be a simple wrapper that materialized a PCollection
and then implemented some abstract function that did a computation on the materialized Iterable)
and incorporate it here?

Josh, I think PObject should be a wrapper around PCollection, but the underlying PCollection
should contain only one element (or be treated as such).  In other words, it should wrap the
result of a distributed computation that reduced/combined a source PCollection into a target
PCollection of 1 element.  Then PObject could have a getValue method that materialized the
underlying PCollection and returned the singleton element found within.  I'm not sure if we
want to strongly enforce that the underlying PCollection for a PObject contains one element
by throwing an exception, or if we simply ignore any element but the first in the underlying

Your suggestion for "some abstract function that did a computation on the materialized Iterable"
doesn't make sense to me, since in my mind a PObject should only care about the first element
in its underlying PCollection.  Could you clarify?  

> Implement PObject in Crunch/Scrunch
> -----------------------------------
>                 Key: CRUNCH-58
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Kiyan Ahmadizadeh
> FlumeJava has the concept of a PObject<T>, a container for a singleton of type
T.  It is meant represent the result of a distributed computation that yields a singleton
value (for example max, min, and length methods on PCollection<T>).  Generally speaking,
the result of any computation that combines/reduces a PCollection into a singleton value could
be represented by a PObject.  
> Like PCollection, a PObject defers distributed computation until its value is actually

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message