incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiyan Ahmadizadeh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-57) Add a length function to PCollection
Date Thu, 06 Sep 2012 20:04:07 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449999#comment-13449999
] 

Kiyan Ahmadizadeh commented on CRUNCH-57:
-----------------------------------------

I'm up for taking on an implementation of PObject and incorporating it into this change. 
I've created a ticket CRUNCH-58 for this.  Josh, please check that ticket for some discussion
on the implementation of PObject.  

+1 For using decorators to achieve the Fluent pattern without crowding the methods in the
PCollection interface.  This would work well in Java and Scala. I think Gabriel's geometry
example highlights the issue that you may want special operations on PCollections holding
objects of a specific type.  Another example would be PCollections of numeric data.  It would
make sense for such collections to have special operations like average, sum, etc.  

-1 On not including length() in the base PCollection interface, however.  I think decorators
are great for the case outlined above, where the functionality applies only to PCollections
holding objects of a specific type.  Counting the number of elements in a PCollection, however,
is applicable to all PCollection regardless of the type of object it contains.  I think operations
that can apply to any and all PCollections belong in the PCollection interface, and operations
applicable to a specific kind of PCollection belong in decorators.  For this reason I argue
that length() goes in the PCollection interface.  
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered into another.
 If I'm interested in how many elements of the original PCollection matched the filter, I'll
have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the number of elements
in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message