hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-93) allow minimum split size configurable
Date Fri, 17 Mar 2006 20:01:20 GMT
allow minimum split size configurable
-------------------------------------

         Key: HADOOP-93
         URL: http://issues.apache.org/jira/browse/HADOOP-93
     Project: Hadoop
        Type: Bug
    Reporter: Hairong Kuang


The current default split size is the size of a block (32M) and a SequenceFile sets it to
be SequenceFile.SYNC_INTERVAL(2K). We currently have a Map/Reduce application working on crawled
docuements. Its input data consists of 356 sequence files, each of which is of a size around
30G. A jobtracker takes forever to launch the job because it needs to generate 356*30G/2K
map tasks!

The proposed solution is to let the minimum split size configurable so that the programmer
can control the number of tasks to generate.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message