Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31ADD10AB0 for ; Tue, 16 Apr 2013 00:50:16 +0000 (UTC) Received: (qmail 32022 invoked by uid 500); 16 Apr 2013 00:50:16 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 31993 invoked by uid 500); 16 Apr 2013 00:50:16 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 31984 invoked by uid 500); 16 Apr 2013 00:50:16 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 31980 invoked by uid 99); 16 Apr 2013 00:50:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 00:50:16 +0000 Date: Tue, 16 Apr 2013 00:50:15 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-165) Pipelines should automatically use CombineFileInputFormat where input consists of many small files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632449#comment-13632449 ] Josh Wills commented on CRUNCH-165: ----------------------------------- Cool, thanks for giving it a whirl. Will be curious to hear what the issue is. > Pipelines should automatically use CombineFileInputFormat where input consists of many small files > -------------------------------------------------------------------------------------------------- > > Key: CRUNCH-165 > URL: https://issues.apache.org/jira/browse/CRUNCH-165 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.4.0 > Reporter: Dave Beech > Assignee: Josh Wills > Attachments: CRUNCH-165-jwills.patch, CRUNCH-165.patch > > > Hive had a feature introduced in HIVE-74 whereby CombineFileInputFormat would be used if the input data consisted of many small files, making the resulting mapreduce jobs more efficient by giving individual mappers more data to process. This would be a nice feature for Crunch to have, too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira