hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ginzburg <ginz...@hotmail.com>
Subject RE: Intermediate merge failed
Date Mon, 24 Jan 2011 08:09:15 GMT

Hi,
Thank you.
Before I changed your suggested parameters, I tried to run the job with 128 reducers instead
of 64, and the job completed.
I will wait to see if this fix is consistent before I change any of the mapred-site.xml since
it is a production environment.
I am still wondering what is the root cause of this Exception, The documentation for this
Exception is hardly available.
  

Date: Sun, 23 Jan 2011 11:41:05 -0500
Subject: Re: Intermediate merge failed
From: thiruvathuru@gmail.com
To: mapreduce-user@hadoop.apache.org

Try modifying some of these parameters....
("io.sort.mb", "350");("io.sort.factor", "100");("io.file.buffer.size", "131072");
("mapred.child.java.opts", "-Xms1024m -Xmx1024m");("mapred.reduce.parallel.copies", "8");("mapred.tasktracker.map.tasks.maximum",
"12");


-Raja Thiruvathuru


On Sun, Jan 23, 2011 at 9:26 AM, David Ginzburg <ginzman@hotmail.com> wrote:






Hi,
My cluster contains 22 DataNodes and Task Tracker each with 8 mapper slots and 4 reduce slots,
each with 1.5G max heap size.
I use cloudera CDH 2
I have a specific job that is constantly  failing in the reduce phase. I use 64 reducers and
64M block size and compress map output with LZO.


The same Exception which appears on all failed reduce tasks is :

The reduce copier failed
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Caused by: java.io.IOException: Intermediate merge failed
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2651)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2576)

Caused by: java.lang.RuntimeException: java.io.EOFException
        at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:128)
        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)

        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
        at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
        at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)

        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2635)

        ... 1 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at org.apache.hadoop.io.Text.readFields(Text.java:265)
        at com.conduit.UserLoginLog.Distinct2ActiveUsersKey.readFields(Distinct2ActiveUsersKey.java:114)

        at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
        ... 8 more

I do expect a relatively large map output from this job.

My mapred-site.xml contains

<property>

   <name>mapred.tasktracker.map.tasks.maximum</name>
   <value>8</value>
   <final>false</final>
 </property>

 <property>
   <name>mapred.tasktracker.reduce.tasks.maximum</name>

   <value>4</value>
   <final>false</final>
 </property>

 <property>
   <name>mapred.reduce.tasks</name>
   <value>64</value>
 </property>


 <property>
   <name>mapred.job.reduce.input.buffer.percent</name>
   <value>0.9</value>
 </property>

 <property>
   <name>mapred.job.shuffle.merge.percent</name>

   <value>0.8</value>
 </property>

 <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1536m -Djava.library.path=/usr/lib/hadoop/lib/native -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false</value>

  </property>
<property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
  </property>
  <property>
    <name>mapreduce.map.output.compress.codec</name>

    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>


Can anyone speculate as to whats causing this? how can I at least make the job complete ?
 		 	   		  


-- 

Raja Thiruvathuru

 		 	   		  
Mime
View raw message