hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudhir Vallamkondu <Sudhir.Vallamko...@icrossing.com>
Subject Re: Problem identifying cause of a failed job
Date Wed, 17 Nov 2010 03:46:51 GMT
Try upgrading to JVM 6.0_21. We have had JVM issues with 6.0.18 and Hadoop.


On 11/16/10 4:58 PM, "common-user-digest-help@hadoop.apache.org"
<common-user-digest-help@hadoop.apache.org> wrote:

> From: Greg Langmead <glangmead@sdl.com>
> Date: Tue, 16 Nov 2010 17:50:17 -0500
> To: <common-user@hadoop.apache.org>
> Subject: Problem identifying cause of a failed job
> 
> Newbie alert.
> 
> I have a Pig script I tested on small data and am now running it on a larger
> data set (85GB). My cluster is two machines right now, each with 16 cores
> and 32G of ram. I configured Hadoop to have 15 tasktrackers on each of these
> nodes. One of them is the namenode, one is the secondary name node. I¹m
> using Pig 0.7.0 and Hadoop 0.20.2 with Java 1.6.0_18 on Linux Fedora Core
> 12, 64-bit.
> 
> My Pig job starts, and eventually a reduce task fails. I¹d like to find out
> why. Here¹s what I know:
> 
> The webUI lists the failed reduce tasks and indicates this error:
> 
> java.io.IOException: Task process exit with nonzero status of 134.
>     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> 
> The userlog userlogs/attempt_201011151350_0001_r_000063_0/stdout says this:
> 
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007ff74158463c, pid=27109, tid=140699912791824
> #
> # JRE version: 6.0_18-b07
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
> linux-amd64 )
> [thread 140699484784400 also had an error]# Problematic frame:
> 
> # V  [libjvm.so+0x62263c]
> #
> # An error report file with more information is saved as:
> # 
> /tmp/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201011151350_0001/a
> ttempt_201011151350_0001_r_000063_0/work/hs_err_pid27109.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> 
> My mapred-site.xml already includes this:
> 
> <property>
> <name>keep.failed.task.files</name>
> <value>true</value>
> </property>
> 
> So I was hoping that the file hs_err_pid27109.log would exist but it
> doesn¹t. I was sure to check the /tmp dir on both tasktrackers. In fact
> there is no dir  
> 
>   jobcache/job_201011151350_0001/attempt_201011151350_0001_r_000063_0
> 
> only
> 
>   
> jobcache/job_201011151350_0001/attempt_201011151350_0001_r_000063_0.cleanup
> 
> I¹d like to find the source of the segfault, can anyone point me in the
> right direction? 
> 
> Of course let me know if you need more information!


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential
and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by reply email
and destroy all copies of the original message.



Mime
View raw message