Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E9EDD16C for ; Thu, 16 Aug 2012 18:43:39 +0000 (UTC) Received: (qmail 33917 invoked by uid 500); 16 Aug 2012 18:43:39 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 33873 invoked by uid 500); 16 Aug 2012 18:43:38 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 33767 invoked by uid 99); 16 Aug 2012 18:43:38 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Aug 2012 18:43:38 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 2CF412C5BE2 for ; Thu, 16 Aug 2012 18:43:38 +0000 (UTC) Date: Fri, 17 Aug 2012 05:43:38 +1100 (NCT) From: "Gabriel Reid (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: <246908626.20316.1345142618184.JavaMail.jiratomcat@arcas> In-Reply-To: <1853483155.85795.1342870774642.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (CRUNCH-23) PCollection#sort doesn't do a full sort on values MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436211#comment-13436211 ] Gabriel Reid commented on CRUNCH-23: ------------------------------------ I was just going to take a look into this as well -- and I've got a couple of questions. Is the patch CRUNCH-23-sorting-issue.patch the full cumulative patch? Also, I just took a quick look at it, and it appears that it might be reliant on using SequenceFiles (and therefore it wouldn't work with Avro) -- any idea if this is the case? > PCollection#sort doesn't do a full sort on values > ------------------------------------------------- > > Key: CRUNCH-23 > URL: https://issues.apache.org/jira/browse/CRUNCH-23 > Project: Crunch > Issue Type: Bug > Reporter: Gabriel Reid > Assignee: Rahul Sharma > Attachments: 0001-CRUNCH-23-fix-sorting.patch, CRUNCH-23-sorting-issue.patch, CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch, SortTest.java > > > When a PCollection is sorted (using PCollection#sort), the sorting that is performed is only per reducer, and not an absolute sort over all values. This means that the values are not in sorted order if they are iterated over on a materialized collection. It also means that the sorted files that are output from a sort operation can not be simply concatenated to come to a single sorted file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira