Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D8CAE21E for ; Wed, 6 Feb 2013 10:31:15 +0000 (UTC) Received: (qmail 80604 invoked by uid 500); 6 Feb 2013 10:31:15 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 80417 invoked by uid 500); 6 Feb 2013 10:31:13 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 80388 invoked by uid 99); 6 Feb 2013 10:31:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 10:31:12 +0000 Date: Wed, 6 Feb 2013 10:31:12 +0000 (UTC) From: "Dave Beech (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-162) Add utility function for merging output by identity reduce MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572313#comment-13572313 ] Dave Beech commented on CRUNCH-162: ----------------------------------- I've got a patch for this but I've just realised the code is almost identical to something in the Sort class (or at least, how it was before the hardcoded single reducer from CRUNCH-23). The "sort-pre" function followed by the GBK and ungroup is what I need, but with configurable numbers of reducers rather than configurable sort order. Also I've just noticed that the "sort-post" function inside Sort is a duplicate of PTables.keys I don't want to add any further duplication. Any ideas? > Add utility function for merging output by identity reduce > ---------------------------------------------------------- > > Key: CRUNCH-162 > URL: https://issues.apache.org/jira/browse/CRUNCH-162 > Project: Crunch > Issue Type: Improvement > Components: MapReduce Patterns > Affects Versions: 0.4.0 > Reporter: Dave Beech > Priority: Minor > > Something I find myself doing reasonably often in mapreduce is to use > the reduce step as nothing more than a means to merge data into larger > files (using the identity reducer). > There doesn't appear to be a neat way to do this with Crunch at the moment. > Ref: http://mail-archives.apache.org/mod_mbox/incubator-crunch-user/201302.mbox/%3CCAFZSZPsXRxWT45c9w4ef7Ruij2exE4HP2CDNMjd%2BVc%3D9RWX-Jw%40mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira