hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: MapReduce jobs hanging or failing near completion
Date Tue, 19 Jul 2011 20:50:14 GMT
Is this reproducible? If so, I'd urge you to check your local disks...

Arun

On Jul 19, 2011, at 12:41 PM, Kai Ju Liu wrote:

> Hi Marcos. The issue appears to be the following. A reduce task is unable to fetch results
from a map task on HDFS. The map task is re-run, but the map task is now unable to retrieve
information that it needs to run. Here is the error from the second map task:
> java.io.FileNotFoundException: /mnt/hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201107171642_0560/attempt_201107171642_0560_m_000292_1/output/spill0.out
> 	at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
> 	at org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205)
> 	at org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165)
> 	at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418)
> 	at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> 	at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1547)
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1179)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:262)
> 
> I have been having general difficulties with HDFS on EBS, which pointed me in this direction.
Does this sound like a possible hypothesis to you? Thanks!
> 
> 
> Kai Ju
> 
> P.S. I am migrating off of HDFS on EBS, so I will post back with further results as soon
as I have them.
> On Thu, Jul 7, 2011 at 6:36 PM, Marcos Ortiz <mlortiz@uci.cu> wrote:
> 
> 
> El 7/7/2011 8:43 PM, Kai Ju Liu escribió:
> 
> Over the past week or two, I've run into an issue where MapReduce jobs
> hang or fail near completion. The percent completion of both map and
> reduce tasks is often reported as 100%, but the actual number of
> completed tasks is less than the total number. It appears that either
> tasks backtrack and need to be restarted or the last few reduce tasks
> hang interminably on the copy step.
> 
> In certain cases, the jobs actually complete. In other cases, I can't
> wait long enough and have to kill the job manually.
> 
> My Hadoop cluster is hosted in EC2 on instances of type c1.xlarge with 4
> attached EBS volumes. The instances run Ubuntu 10.04.1 with the
> 2.6.32-309-ec2 kernel, and I'm currently using Cloudera's CDH3u0
> distribution. Has anyone experienced similar behavior in their clusters,
> and if so, had any luck resolving it? Thanks!
> 
> Can you post here your NN and DN logs files?
> Regards
> 
> Kai Ju
> 
> -- 
> Marcos Luís Ortíz Valmaseda
>  Software Engineer (UCI)
>  Linux User # 418229
>  http://marcosluis2186.posterous.com
>  http://twitter.com/marcosluis2186
> 


Mime
View raw message