Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C4459CF9 for ; Thu, 23 Aug 2012 09:21:47 +0000 (UTC) Received: (qmail 66866 invoked by uid 500); 23 Aug 2012 09:21:47 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 66688 invoked by uid 500); 23 Aug 2012 09:21:45 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 66601 invoked by uid 99); 23 Aug 2012 09:21:43 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 09:21:43 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9CB402C0907 for ; Thu, 23 Aug 2012 09:21:42 +0000 (UTC) Date: Thu, 23 Aug 2012 20:21:42 +1100 (NCT) From: "Rahul Sharma (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: <2097775978.4740.1345713702642.JavaMail.jiratomcat@arcas> In-Reply-To: <1853483155.85795.1342870774642.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (CRUNCH-23) PCollection#sort doesn't do a full sort on values MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440170#comment-13440170 ] Rahul Sharma commented on CRUNCH-23: ------------------------------------ Gabriel, I check the patch for avro files. It does not work. My bad, should have verified it earlier. Also while fixing it I am getting stuck at a point. In the end the TotalOrdePartioner requires a SequenceFile. How can I make one using the keys from Avro data? Still trying out a few options e.g. configuring AvroSequenceFileOutputFormat. > PCollection#sort doesn't do a full sort on values > ------------------------------------------------- > > Key: CRUNCH-23 > URL: https://issues.apache.org/jira/browse/CRUNCH-23 > Project: Crunch > Issue Type: Bug > Reporter: Gabriel Reid > Assignee: Rahul Sharma > Attachments: 0001-CRUNCH-23-fix-sorting.patch, CRUNCH-23-sorting-issue.patch, CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java > > > When a PCollection is sorted (using PCollection#sort), the sorting that is performed is only per reducer, and not an absolute sort over all values. This means that the values are not in sorted order if they are iterated over on a materialized collection. It also means that the sorted files that are output from a sort operation can not be simply concatenated to come to a single sorted file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira