hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Closed: (HADOOP-229) hadoop cp should generate a better number of map tasks
Date Mon, 05 Jun 2006 23:09:01 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-229?page=all ]
Doug Cutting closed HADOOP-229:

> hadoop cp should generate a better number of map tasks
> ------------------------------------------------------
>          Key: HADOOP-229
>          URL: http://issues.apache.org/jira/browse/HADOOP-229
>      Project: Hadoop
>         Type: Bug

>   Components: fs
>     Reporter: Yoram Arnon
>     Assignee: Milind Bhandarkar
>     Priority: Minor
>      Fix For: 0.3.0

> hadoop cp currently assigns 10 files to copy per map task.
> in case of a small number of large files on a large cluster (say 300 files of 30GB each
on a 300 node cluster), this results in long execution times.
> better would be to assign files per task such that the entire cluster is utilized: one
file per map, with a cap of 10000 maps total, so as not to over burden the job tracker.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message