incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-23) PCollection#sort doesn't do a full sort on values
Date Thu, 23 Aug 2012 11:17:42 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440217#comment-13440217
] 

Rahul Sharma commented on CRUNCH-23:
------------------------------------

I am able to create sequence files for avro data, with AvroKey as the key class. When it is
read back in TotalOrderPartioner then it back exceptions as it expects the key to be of type
WritableComparable :

java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast to org.apache.hadoop.io.WritableComparable
	at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:295)
	at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:80)

Any suggestions ?
	
                
> PCollection#sort doesn't do a full sort on values
> -------------------------------------------------
>
>                 Key: CRUNCH-23
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-23
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Assignee: Rahul Sharma
>         Attachments: 0001-CRUNCH-23-fix-sorting.patch, CRUNCH-23-sorting-issue.patch,
CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java
>
>
> When a PCollection is sorted (using PCollection#sort), the sorting that is performed
is only per reducer, and not an absolute sort over all values. This means that the values
are not in sorted order if they are iterated over on a materialized collection. It also means
that the sorted files that are output from a sort operation can not be simply concatenated
to come to a single sorted file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message