hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiss Tibor <kiss.ti...@gmail.com>
Subject Re: Problem identifying cause of a failed job
Date Mon, 22 Nov 2010 13:42:15 GMT
Hi!

Myself I had similiar issue with just a simple distcp from s3n to hdfs.. a
small file (< 10MByte) I would like to copy it.

If I start the m1.small instance it works, if I start m1.large I always get
this error in tasktracker. See the attached logfile.

Unfortunately, here I have the latest jdk version.

Tibor

On Wed, Nov 17, 2010 at 6:44 PM, Matt Pouttu-Clarke <
Matt.Pouttu-Clarke@icrossing.com> wrote:

> We were getting SIGSEGV and fixed it by upgrading the JVM.  We are using
> 1.6.0_21 currently.
>
>
>
>
>
> On Nov 16, 2010, at 3:50 PM, "Greg Langmead" <glangmead@sdl.com> wrote:
>
>  Newbie alert.
>>
>> I have a Pig script I tested on small data and am now running it on a
>> larger
>> data set (85GB). My cluster is two machines right now, each with 16 cores
>> and 32G of ram. I configured Hadoop to have 15 tasktrackers on each of
>> these
>> nodes. One of them is the namenode, one is the secondary name node. I’m
>> using Pig 0.7.0 and Hadoop 0.20.2 with Java 1.6.0_18 on Linux Fedora Core
>> 12, 64-bit.
>>
>> My Pig job starts, and eventually a reduce task fails. I’d like to find
>> out
>> why. Here’s what I know:
>>
>> The webUI lists the failed reduce tasks and indicates this error:
>>
>> java.io.IOException: Task process exit with nonzero status of 134.
>>   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>
>> The userlog userlogs/attempt_201011151350_0001_r_000063_0/stdout says
>> this:
>>
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x00007ff74158463c, pid=27109, tid=140699912791824
>> #
>> # JRE version: 6.0_18-b07
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
>> linux-amd64 )
>> [thread 140699484784400 also had an error]# Problematic frame:
>>
>> # V  [libjvm.so+0x62263c]
>> #
>> # An error report file with more information is saved as:
>> #
>>
>> /tmp/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_201011151350_0001/a
>> ttempt_201011151350_0001_r_000063_0/work/hs_err_pid27109.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>> #
>>
>> My mapred-site.xml already includes this:
>>
>> <property>
>> <name>keep.failed.task.files</name>
>> <value>true</value>
>> </property>
>>
>> So I was hoping that the file hs_err_pid27109.log would exist but it
>> doesn’t. I was sure to check the /tmp dir on both tasktrackers. In fact
>> there is no dir
>>
>>  jobcache/job_201011151350_0001/attempt_201011151350_0001_r_000063_0
>>
>> only
>>
>>
>>
>> jobcache/job_201011151350_0001/attempt_201011151350_0001_r_000063_0.cleanup
>>
>> I’d like to find the source of the segfault, can anyone point me in the
>> right direction?
>>
>> Of course let me know if you need more information!
>>
>> Greg Langmead | Senior Research Scientist | SDL Language Weaver | (t) +1
>> 310
>> 437 7300
>>
>> SDL PLC confidential, all rights reserved.
>> If you are not the intended recipient of this mail SDL requests and
>> requires that you delete it without acting upon or copying any of its
>> contents, and we further request that you advise us.
>> SDL PLC is a public limited company registered in England and Wales.
>>  Registered number: 02675207.
>> Registered address: Globe House, Clivemont Road, Maidenhead, Berkshire SL6
>> 7DY, UK.
>>
>
> iCrossing Privileged and Confidential Information
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information of iCrossing. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not the intended recipient, please contact the sender by reply email and
> destroy all copies of the original message.
>
>
>

Mime
View raw message