hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-960) Incorrect number of map tasks when there are multiple input files
Date Tue, 30 Jan 2007 18:00:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468718

Owen O'Malley commented on HADOOP-960:

Oops, sorry about that. To actually get 128 splits you would need to write your own InputFormat
and implement getSplit yourself. That said, it is usually better to take the extra maps and
get data locality on the map input.

> Incorrect number of map tasks when there are multiple input files
> -----------------------------------------------------------------
>                 Key: HADOOP-960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-960
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Andrew McNabb
> This problem happens with hadoop-streaming and possibly elsewhere.  If there are 5 input
files, it will create 130 map tasks, even if mapred.map.tasks=128.  The number of map tasks
is incorrectly set to a multiple of the number of files.  (I wrote a much more complete bug
report, but Jira lost it when it had an error, so I'm not in the mood to write it all again)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message