hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-960) Incorrect number of map tasks when there are multiple input files
Date Sat, 16 Jul 2011 18:26:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Harsh J resolved HADOOP-960.

    Resolution: Invalid

The ability to specify "mapred.map.tasks" is going away with the new API of MR. The only _right_
way to control splits is to have your own InputFormat that does it the way you need it to.
The default way has worked for many (being local-data sensitive, as long as such information
is available, but also split size tunable), and can also be asked to process whole files with
a very simple subclass/configuration.

Resolving as invalid (now, and onwards) since InputFormat#getSplits(…) is not going anywhere,
and can do what you want it to.

Regd. record num splits, MR now has NLineInputFormat as well, which indeed opens and reads
through the file.

> Incorrect number of map tasks when there are multiple input files
> -----------------------------------------------------------------
>                 Key: HADOOP-960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-960
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.10.1
>            Reporter: Andrew McNabb
>            Priority: Minor
> This problem happens with hadoop-streaming and possibly elsewhere.  If there are 5 input
files, it will create 130 map tasks, even if mapred.map.tasks=128.  The number of map tasks
is incorrectly set to a multiple of the number of files.  (I wrote a much more complete bug
report, but Jira lost it when it had an error, so I'm not in the mood to write it all again)

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message