hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noel C. F. Codella, Ph.D. (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5313) JobTracker Creates Empty Mapper Task, and a Mapper Task with 2 FileSplits.
Date Sun, 09 Jun 2013 07:14:20 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Noel C. F. Codella, Ph.D. updated MAPREDUCE-5313:
-------------------------------------------------

    Description: 
When reading an input text file, the Job Tracker seems to assign the first two FileSplits
to a single Mapper Task, then assigns an EMPTY FileSplit (end of file) to a Mapper Task, which
finishes instantaneously. This can affect job balance, since one map job is now twice as big
as the others.

In "src/mapred/org/apache/hadoop/mapred/LineRecordReader.java", line 110, there is a comment
about skipping the first line of the input file by default, since "next()" reads two lines
anyway. This was not the behavior in 0.20.2, which did not have this problem.

Seems perhaps related to :

"HADOOP-4010. Change semantics for LineRecordReader to read an additional
    line per split- rather than moving back one character in the stream- to
    work with splittable compression codecs. (Abdul Qadeer via cdouglas)"

It seems this was not implemented properly and is leading to the issue described above in
the situation that the input file is text.


  was:
When reading an input file, the Job Tracker seems to assign the first two FileSplits to a
single Mapper Task, then assigns an EMPTY FileSplit (end of file) to a Mapper Task, which
finishes instantaneously. This can affect job balance, since one map job is now twice as big
as the others.

In "src/mapred/org/apache/hadoop/mapred/LineRecordReader.java", line 110, there is a comment
about skipping the first line of the input file by default, since "next()" reads two lines
anyway. This was not the behavior in 0.20.2, which did not have this problem.

It seems this was not implemented properly and is leading to the issue described above.


    
> JobTracker Creates Empty Mapper Task, and a Mapper Task with 2 FileSplits.
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5313
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5313
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.2.0
>         Environment: Linux
>            Reporter: Noel C. F. Codella, Ph.D.
>
> When reading an input text file, the Job Tracker seems to assign the first two FileSplits
to a single Mapper Task, then assigns an EMPTY FileSplit (end of file) to a Mapper Task, which
finishes instantaneously. This can affect job balance, since one map job is now twice as big
as the others.
> In "src/mapred/org/apache/hadoop/mapred/LineRecordReader.java", line 110, there is a
comment about skipping the first line of the input file by default, since "next()" reads two
lines anyway. This was not the behavior in 0.20.2, which did not have this problem.
> Seems perhaps related to :
> "HADOOP-4010. Change semantics for LineRecordReader to read an additional
>     line per split- rather than moving back one character in the stream- to
>     work with splittable compression codecs. (Abdul Qadeer via cdouglas)"
> It seems this was not implemented properly and is leading to the issue described above
in the situation that the input file is text.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message