crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-57) Add a length function to PCollection
Date Fri, 14 Sep 2012 07:49:08 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455645#comment-13455645
] 

Rahul Sharma commented on CRUNCH-57:
------------------------------------

@Gabriel Yes you are right on both  the issues. We can address them by asking for a Comparable
S. But if we make things WritableComparable then we could tap hadoop.
Also the Sort API suffers from both these issues. There also if the Writable is not comparable
then the error will not be that clear and S could have a comparator that is different from
the Writable. I would think that we would like to have a similar fix here i.e. basing things
on S. If we try to make Comparable fixes here then we will loose what hadoop could do for
us, out of the box. If we can keep Sort API like that why not let the min/max funcs also.
Any thoughts ? or Can we do some things in Sort API  to make it in sync with min/max ?
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch, CRUNCH-57.patch, MinMaxFn.patch, minver2.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered into another.
 If I'm interested in how many elements of the original PCollection matched the filter, I'll
have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the number of elements
in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message