hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subramaniam Krishnan (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3387) Custom Splitter for handling many small files
Date Wed, 14 May 2008 07:31:55 GMT
Custom Splitter for handling many small files

                 Key: HADOOP-3387
                 URL: https://issues.apache.org/jira/browse/HADOOP-3387
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Subramaniam Krishnan
             Fix For: 0.18.0

Hadoop by default allocates a Map to a file irrespective of size. This is not optimal if you
have a large number of small files, for e.g:- If you 2000 100KB files, 2000 Maps will be allocated
for the job.

The Custom Multi File Splitter collapses all the small files to a single split till the DFS
Block Size is hit. 
It also take care of handling big files by splitting them on Block Size and adding up all
the reminders(if any) to a further splits of Block Size. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message