incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-58) Implement PObject in Crunch/Scrunch
Date Tue, 11 Sep 2012 23:16:08 GMT


Kiyan Ahmadizadeh updated CRUNCH-58:

    Attachment: CRUNCH-58.patch

    This commit adds PObjects to Crunch.  A PObject encapsulates a singleton
    value produced from a distributed computation.  The changes in this commit
    1. Adding a PObject interface to the Java code base.
    2. Adding an abstract class PObjectImpl that implements a PObject backed by
    a PCollection.  Concrete subclasses implement the PObjectImpl#process method
    to transform an iterable obtained from materializing the backing PCollection
    into the singleton value encapsulated by the PObject.
    3. Adding concrete subclasses of PObjectImpl that a) Use the first element of
    the backing PCollection as the PObject value, b) Use a Java collection
    containing the elements of the backing PCollection as the PObject value and
    c) Use a Java Map containing the mappings defined by Pairs in the backing
    PCollection as the PObject value.
    4. Modifying min() and max() on PCollection to return PObjects.
    5. Adding an asCollection method to PCollection<S> that returns a
    PObject<Collection<S>> of the PCollectin's elements.
    6. Adding an asMap method to PTable<K, V> that returns a PObject<Map<K,V>>
    of the PTable's elements.
    7. Adding PObject to the Scala code base and modifying min() and max()
    in Scala's PCollection to return PObjects.
    Tests have been added for PObjectImpl and its concrete subclasses. Tests
    for the new asCollection and asMap methods have also been added. Existing
    tests were modified to accomodate changes to min() and max().

> Implement PObject in Crunch/Scrunch
> -----------------------------------
>                 Key: CRUNCH-58
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Kiyan Ahmadizadeh
>         Attachments: CRUNCH-58.patch
> FlumeJava has the concept of a PObject<T>, a container for a singleton of type
T.  It is meant represent the result of a distributed computation that yields a singleton
value (for example max, min, and length methods on PCollection<T>).  Generally speaking,
the result of any computation that combines/reduces a PCollection into a singleton value could
be represented by a PObject.  
> Like PCollection, a PObject defers distributed computation until its value is actually

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message