hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value
Date Wed, 20 Mar 2013 05:45:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607305#comment-13607305
] 

Karthik Kambatla commented on MAPREDUCE-5028:
---------------------------------------------

Uploaded a patch with one additional change. More details below:
# The trunk patch from March 4 worked for large values of {{io.sort.mb}}, but has caused a
bunch of test failures.
# The trunk patch from March 19 fixes the test failures by removing an adjustment to a call
to {{DataInputBuffer#reset()}} that wasn't supposed to be.
# However, the above mentioned adjustment was partly responsible for the original patch working
for large values of {{io.sort.mb}}. Removing that has fixed test failures, but the jobs with
large values of {{io.sort.mb}} started failing.
# The fix seems to be an adjustment to {{DataInputBuffer#reset()}} in {{ReduceContextImpl}}.
A complementary change to {{ReduceContext}} exists in branch-1 patch. While working on the
original patch, I did make this adjustment but later removed it on noticing negative offsets
in jobs (the same behavior as in the test failures). The mess-up seems to be in my picking
the wrong adjustment to undo. The latest patch (19 Mar 22:26) fixes this mistake.

The following steps validate the patch:
# Ran *all* tests under hadoop-mapreduce-project 
# Ran (pi 4 1000), (teragen 4GB data - 4 mappers), (wordcount on teragen output) on a pseudo-dist
cluster with dfs.block.size=64MB, java.opts=1GB, io.sort.mb=256MB
# Ran (pi 4 1000), (teragen 4GB data - 2 mappers), (wordcount on teragen output) on a pseudo-dist
cluster with dfs.block.size=2GB, java.opts=2.5GB, io.sort.mb=1280MB

Please suggest any additional testing that you think is required.

Thanks again Chris for immediately notifying us of the issue. Sorry Alejandro and Bobby for
the additional trouble.

                
> Maps fail when io.sort.mb is set to high value
> ----------------------------------------------
>
>                 Key: MAPREDUCE-5028
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>             Fix For: 1.2.0, 0.23.7, 2.0.5-beta
>
>         Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch,
mr-5028-trunk.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt
>
>
> Verified the problem exists on branch-1 with the following configuration:
> Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280,
dfs.block.size=2147483648
> Run teragen to generate 4 GB data
> Maps fail when you run wordcount on this configuration with the following error: 
> {noformat}
> java.io.IOException: Spill failed
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
> 	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
> 	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message