hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Swift <davidswiftm...@charter.net>
Subject Re: Debugging "Child Error" in Distributed Map Job
Date Mon, 05 Apr 2010 19:32:36 GMT

I think I found it - there's a user logs directory I missed before.  I'm
getting this:

/usr/java/jdk1.6.0_17/jre/bin/java: error while loading shared libraries:
libdl.so.2: failed to map segment from shared object: Cannot allocate memory

Sorry for missing that earlier.


David Swift wrote:
> 
> I'm seeing this error message sequence over and over again on all of my
> map-reducing nodes hadoop-msp-tasktracker-hadoop-<nodename>.log file:
> 
> 2010-04-05 14:58:52,047 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201004051839_0012_m_000168_3
> task's state:UNASSIGNED
> 2010-04-05 14:58:52,047 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201004051839_0012_m_000168_3
> 2010-04-05 14:58:52,047 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
> attempt_201004051839_0012_m_000168_3
> 2010-04-05 14:58:52,079 WARN org.mortbay.log: /tasklog:
> java.io.IOException: Closed
> 2010-04-05 14:58:52,088 WARN org.mortbay.log: /tasklog:
> java.io.IOException: Closed
> 2010-04-05 14:58:52,099 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201004051839_0012_m_-958381544
> 2010-04-05 14:58:52,099 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201004051839_0012_m_-958381544 spawned.
> 2010-04-05 14:58:52,137 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201004051839_0012_m_-958381544 exited. Number of tasks it ran: 0
> 2010-04-05 14:58:52,137 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_201004051839_0012_m_000168_3 Child Error
> java.io.IOException: Task process exit with nonzero status of 127.
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
> 2010-04-05 14:58:55,146 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201004051839_0012_m_000168_3 done; removing files.
> 2010-04-05 14:58:55,147 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 
> I can find no log file anywhere that tells me *why* there was a Child
> Error or why it exited with a 127 status.  In my client, I'm seeing this
> over and over and it looks pretty likely it's related:
> 
> Apr 5, 2010 7:07:27 PM org.apache.hadoop.mapred.JobClient getTaskLogs
> WARNING: Error reading task
> outputhttp://<each-of-my-cluster-hosts>:50060/tasklog?plaintext=true&taskid=attempt_201004051839_0021_m_000167_1&filter=stdout
> Apr 5, 2010 7:07:27 PM org.apache.hadoop.mapred.JobClient getTaskLogs
> WARNING: Error reading task
> outputhttp://<each-of-my-cluster-hosts>:50060/tasklog?plaintext=true&taskid=attempt_201004051839_0021_m_000167_1&filter=stderr
> 
> Thanks for any help you can provide.  Even a pointer to some log file I'm
> missing would be great.
> 
> Thanks again,
> David Swift
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Debugging-%22Child-Error%22-in-Distributed-Map-Job-tp28143722p28143897.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Mime
View raw message