incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-57) Add a length function to PCollection
Date Wed, 05 Sep 2012 22:18:08 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449193#comment-13449193
] 

Kiyan Ahmadizadeh commented on CRUNCH-57:
-----------------------------------------

Taking a look at the FlumeJava paper (this copy: http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/FlumeJava.pdf),
it looks like the answer to this is the PObject, which acts a bit like a Future, although
PObjects defer the start of computation until the object is accessed, while a Java Future
begins computation as soon as the Future is constructed and blocks on a call to get() if the
Future has yet to complete.  

It seems like the PObject concept would be generally useful.  Min and max on PCollection could
be changed to return a PObject, as could this length method, etc.

Perhaps we should make another ticket for implementing PObject.  Thoughts?
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered into another.
 If I'm interested in how many elements of the original PCollection matched the filter, I'll
have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the number of elements
in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message