hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-229) hadoop cp should generate a better number of map tasks
Date Thu, 18 May 2006 00:24:05 GMT
hadoop cp should generate a better number of map tasks 

         Key: HADOOP-229
         URL: http://issues.apache.org/jira/browse/HADOOP-229
     Project: Hadoop
        Type: Bug

  Components: fs  
    Reporter: Yoram Arnon
 Assigned to: Milind Bhandarkar 
    Priority: Minor

hadoop cp currently assigns 10 files to copy per map task.
in case of a small number of large files on a large cluster (say 300 files of 30GB each on
a 300 node cluster), this results in long execution times.
better would be to assign files per task such that the entire cluster is utilized: one file
per map, with a cap of 10000 maps total, so as not to over burden the job tracker.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message