crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Beech (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-165) Pipelines should automatically use CombineFileInputFormat where input consists of many small files
Date Tue, 03 Sep 2013 11:10:54 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756524#comment-13756524
] 

Dave Beech commented on CRUNCH-165:
-----------------------------------

Great work Josh, thanks! I was looking forward to trying this out but I hit MAPREDUCE-1806
on our version of Hadoop... I'll give it another go when I have something to read from HDFS
rather than S3. 
                
> Pipelines should automatically use CombineFileInputFormat where input consists of many
small files
> --------------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-165
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-165
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.4.0
>            Reporter: Dave Beech
>            Assignee: Josh Wills
>             Fix For: 0.8.0
>
>         Attachments: CRUNCH-165-jwills.patch, CRUNCH-165.patch, CRUNCH-165-v3.patch,
CRUNCH-165-v4.patch
>
>
> Hive had a feature introduced in HIVE-74 whereby CombineFileInputFormat would be used
if the input data consisted of many small files, making the resulting mapreduce jobs more
efficient by giving individual mappers more data to process. This would be a nice feature
for Crunch to have, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message