hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: Task process exit with nonzero status of 1
Date Thu, 24 Sep 2009 18:37:07 GMT
> > A little more background.  This job was working fine for weeks, running
> > hourly, and then failed on Saturday morning and hasn't worked since.

Any chance that ulimit (mapred.child.ulimit) got enabled?

Koji


On 9/24/09 11:24 AM, "Marc Limotte" <mlimotte@feeva.com> wrote:

> Hi Todd.
> 
> No userlogs seem to be created.  I'm guessing, because the map task never
> actually starts.
> 
> I don't see any other errors in the tasktracker log, other than the one I put
> in the first message ("java.io.IOException: Task process exit with nonzero
> status of 1...").  I've included the output from one of the nodes' tasktracker
> logs below.
> 
> Any other suggestions?
> 
> Marc
> 
> 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's
> state:UNASSIGNED
> 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
> launch : attempt_200909221656_0006_m_000003_0
> 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
>  attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO
> org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
> jvm_200909221656_0006_m_-145
> 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_200909221656_0006_m_-1451805198 spawned.
> 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200909221656_0006_m_-1451805198 exited. Number of t
> asks it ran: 02009-09-24 18:15:37,512 WARN
> org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0
> Child Error
> java.io.IOException: Task process exit with nonzero status of 1.
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_m_000003_0 done; removing files.
> 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
> : current free slots : 2
> 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200909221656_0006_r
> _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO
> org.apache.hadoop.mapred.TaskTracker: Trying to launch :
> attempt_200909221656_0006_r_000001_0
> 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
>  attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO
> org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID:
> jvm_200909221656_0006_r_7885
> 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_200909221656_0006_r_788502072 spawned.
> 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200909221656_0006_r_788502072 exited. Number of tas
> ks it ran: 0
> 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_r_000001_0 Child Error
> java.io.IOException: Task process exit with nonzero status of 1.
>         at 
> org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24
> 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_r_000001_0 done; removing files.
> 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
> : current free slots : 2
> 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200909221656_0006_m
> _000002_1 task's state:UNASSIGNED
> 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
> launch : attempt_200909221656_0006_m_000002_1
> 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
> attempt_200909221656_0006_m_000002_1
> 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner
> constructed JVM ID: jvm_200909221656_0006_m_-2120349138
> 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
> jvm_200909221656_0006_m_-2120349138 spawned.
> 2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0
> 2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_m_000002_1 Child Error
> java.io.IOException: Task process exit with nonzero status of 1.
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> 2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_m_000002_1 done; removing files.
> 2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
> : current free slots : 2
> 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's
> state:UNASSIGNED
> 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
> launch : attempt_200909221656_0006_r_000000_1
> 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
> attempt_200909221656_0006_r_000000_1
> 2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner
> constructed JVM ID: jvm_200909221656_0006_r_-1417908695
> 2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
> jvm_200909221656_0006_r_-1417908695 spawned.
> 2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_r_000000_1 Child Error
> java.io.IOException: Task process exit with nonzero status of 1.
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> 2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0
> 2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200909221656_0006_r_000000_1 done; removing files.
> 2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot
> : current free slots : 2
> 2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_200909221656_0006
> 
> 
> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Thursday, September 24, 2009 10:19 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Task process exit with nonzero status of 1
> 
> Hi Marc,
> 
> Exit status 1 usually means some kind of controlled exit by the mapreduce
> child task. Things like JVM crashes usually are indicated by other exit
> codes (134 seems to be the code most commonly reported).
> 
> If you look at the stderr and stdout from your task (in the userlogs/
> directory on the task tracker that ran them) do you see any output?
> Additionally, is there anything in the logs for the task tracker itself?
> That log is hadoop.log.dir/hadoop-<username>-tasktracker*log
> 
> If that log is pretty long, try grepping for WARN, ERROR, or Exception
> 
> -Todd
> 
> On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte <mlimotte@feeva.com> wrote:
> 
>> Thanks for the suggestion, Edward. I only upgraded the JVM after the
>> problem occurred to see if it would help, but it made no difference.
>> 
>> Marc
>> 
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> Sent: Thursday, September 24, 2009 7:50 AM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Task process exit with nonzero status of 1
>> 
>> On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte <mlimotte@feeva.com> wrote:
>>> I'm seeing this error when I try to run my job.
>>> 
>>> java.io.IOException: Task process exit with nonzero status of 1.
>>>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>> 
>>> From what I can find by doing some Google searches, this means the mapred
>> task JVM has crashed.  Not many suggestions about what to do about it.  Some
>> suggestions about increasing max heap.  I tried that, although I don't think
>> that's the issue because it's not a particularly memory intensive process
>> and I've even tried it with a super small input data set of only a few
>> records.  Still see the same issue.
>>> 
>>> Can't find anything else in the logs.  I don't think my task even
>> started, because there are no user logs created at all. Seems to fail during
>> Job Setup.
>>> 
>>> A little more background.  This job was working fine for weeks, running
>> hourly, and then failed on Saturday morning and hasn't worked since.
>>  Obviously, I looked for something that changed at that point, but no one
>> was working at that time... can't find anything that changed.  I tried the
>> job with different input data sets, doesn't seem to matter, unless I run it
>> with no data at all.  The job does run with no input data, but if I have
>> even a few input records it fails-doesn't seem to matter which records.  I
>> suspected some corruption in HDFS, but I was able to extract the data from
>> HDFS (hadoop dfs -get ...) and the data looks ok.  I also copied this data
>> set to our TEST cluster and ran the job there... and it WORKED!
>>> 
>>> Ran one of our other jobs and it failed as well, so it doesn't seem to be
>> job specific either; looks like every job fails the same way.
>>> 
>>> Did a complete reboot of the cluster-no impact.
>>> 
>>> We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit.
>>> 
>>> Any suggestions on what could be wrong or where to look for more
>> information would be appreciated.
>>> 
>>> 
>>> 
>>> Marc Limotte
>>> Feeva Technology
>>> 
>>> PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
>> ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
>> PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
>> DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
>> PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
>> PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>>> 
>> Just a shot in the dark....
>> 
>> Did you update java recently
>> 
>> 
>> http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-m
>> achine/
>> 
>> PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR
>> ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
>> PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
>> DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY
>> PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
>> PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
>> 
> 
> PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY
> THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION
> PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE,
> DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED.
> PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE
> THIS MESSAGE FROM YOUR SYSTEM.


Mime
View raw message