crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-58) Implement PObject in Crunch/Scrunch
Date Wed, 12 Sep 2012 11:37:10 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453908#comment-13453908
] 

Josh Wills commented on CRUNCH-58:
----------------------------------

I understand the motivation on #2, #1 is kinda iffy for the reason you mentioned (i.e., in
Crunch, everything is based off of PCollection). I would prefer to have the interface only
provide the getValue method for now. write() is somewhat misleading, IMO, since it writes
the underlying PCollection, which is not necessarily the same type as the value returned by
the PObject, and the getName/getPipeline stuff shouldn't be necessary since we tend to think
of PObjects as end points, and those methods are intended to be used when we're going to do
some subsequent processing on a PCollection (e.g., the lib/* methods make extensive use of
them, and I don't know that we have PObject lib/* related methods in mind just yet.)

Of course, if we have use cases later on where exposing those methods make sense, adding them
to the interface won't be a big deal. I'm happy to just take your existing patch as-is and
prune off those methods, if it's alright with you.
                
> Implement PObject in Crunch/Scrunch
> -----------------------------------
>
>                 Key: CRUNCH-58
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-58
>             Project: Crunch
>          Issue Type: New Feature
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Kiyan Ahmadizadeh
>         Attachments: CRUNCH-58.patch
>
>
> FlumeJava has the concept of a PObject<T>, a container for a singleton of type
T.  It is meant represent the result of a distributed computation that yields a singleton
value (for example max, min, and length methods on PCollection<T>).  Generally speaking,
the result of any computation that combines/reduces a PCollection into a singleton value could
be represented by a PObject.  
> Like PCollection, a PObject defers distributed computation until its value is actually
used.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message