hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Zillmann <jzillm...@googlemail.com>
Subject Re: hanging task
Date Fri, 07 May 2010 15:59:01 GMT
Sorry, forgot to say that on map task runs about more then one hour.
So i do not think its the progress thing. So there are successful map tasks and those which
are failing failing long after the 10 min timeout.
And the question is why there is not any stack of the map task visibly in the thread dump.
Is it swallowed or is hadoop stuck right after the mapper is done ?

Johannes


On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote:

> Hello Johannes,
> 
> I had a similar problem and I used the first approach suggested by Joseph
> i.e. to report the status back. I used progress() method as well as the
> setStatus() method. progress() method javadoc link is given below.
> 
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29
> 
> This should solve your problem. If you come to know of the reason why it is
> taking more time than you expected, please mention it here.
> 
> Regards,
> Raghava.
> 
> On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <cryptcom@gmail.com> wrote:
> 
>> You need to either report status or increment a counter from within
>> your task. In your Java code there is a little trick to help the job
>> be “aware” within the cluster of tasks that are not dead but just
>> working hard.  During execution of a task there is no built in
>> reporting that the job is running as expected if it is not writing
>> out.  So this means that if your tasks are taking up a lot of time
>> doing work it is possible the cluster will see that task as failed
>> (based on the mapred.task.tracker.expiry.interval setting).
>> 
>> Have no fear there is a way to tell cluster that your task is doing
>> just fine.  You have 2 ways todo this you can either report the status
>> or increment a counter.  Both of these will cause the task tracker to
>> properly know the task is ok and this will get seen by the jobtracker
>> in turn.  Both of these options are explained in the JavaDoc
>> 
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html
>> 
>> Here are some pointers in general you may find useful
>> 
>> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
>> 
>> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann
>> <jzillmann@googlemail.com> wrote:
>>> Hi hadoop folks,
>>> 
>>> i'm encountering following problem on a 4 node cluster running
>> hadoop-0.20.2.
>>> 
>>> Its a map only job reading about 9 GB data from outside of hadoop. 31 map
>> tasks at all while 12 map tasks running at a time.
>>> The first wave of mappers finishes successfully.
>>> Later on the first tasks are failing and they do that shortly before
>> finishing:
>>> Task attempt_201005061210_0002_m_000001_0 failed to report status for 602
>> seconds. Killing!
>>> Task attempt_201005061210_0002_m_000001_1 failed to report status for 600
>> seconds. Killing!
>>> Task attempt_201005061210_0002_m_000001_2 failed to report status for 602
>> seconds. Killing!
>>> 
>>> The unusual is that i do not find any signs of the job code in the thread
>> dump the tasktracker takes automatically:
>>> ----------------------------------------------------------------
>>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201004301437_0050_m_000001_0: Task
>> attempt_201004301437_0050_m_000001_0 failed to report status for 601
>> seconds. Killing!
>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>> Process Thread Dump: lost task
>>> 34 active threads
>>> Thread 29555 (process reaper):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495 spawned.):
>>> State: WAITING
>>> Blocked count: 1
>>> Waited count: 2
>>> Waiting on java.lang.UNIXProcess@34b56f69
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>> Thread 29550 (Thread-15193):
>>> State: WAITING
>>> Blocked count: 13
>>> Waited count: 14
>>> Waiting on java.lang.Object@72c09161
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>> Thread 29501 (process reaper):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.):
>>> State: WAITING
>>> Blocked count: 1
>>> Waited count: 2
>>> Waiting on java.lang.UNIXProcess@56175cd4
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>> Thread 29496 (Thread-15162):
>>> State: WAITING
>>> Blocked count: 13
>>> Waited count: 14
>>> Waiting on java.lang.Object@3b916de2
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>> Thread 29225 (process reaper):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314 spawned.):
>>> State: WAITING
>>> Blocked count: 1
>>> Waited count: 2
>>> Waiting on java.lang.UNIXProcess@39874d3b
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>> Thread 29217 (process reaper):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
>>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
>>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
>>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636 spawned.):
>>> State: WAITING
>>> Blocked count: 1
>>> Waited count: 2
>>> Waiting on java.lang.UNIXProcess@2c39220f
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
>>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
>>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
>>> 
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
>>> 
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
>>> Thread 29208 (Thread-15014):
>>> State: WAITING
>>> Blocked count: 18
>>> Waited count: 14
>>> Waiting on java.lang.Object@7ddbbae2
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>> Thread 29203 (Thread-15011):
>>> State: WAITING
>>> Blocked count: 13
>>> Waited count: 14
>>> Waiting on java.lang.Object@2dac3f6f
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
>>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from
>> hadoop):
>>> State: TIMED_WAITING
>>> Blocked count: 10075
>>> Waited count: 10075
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
>>>   org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
>>> Thread 33 (Directory/File cleanup thread):
>>> State: WAITING
>>> Blocked count: 2
>>> Waited count: 615
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4d6c68b3
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>> 
>> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
>>> Thread 9 (taskCleanup):
>>> State: WAITING
>>> Blocked count: 30
>>> Waited count: 92
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3298407f
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317)
>>>   java.lang.Thread.run(Thread.java:619)
>>> Thread 32 (TaskLauncher for task):
>>> State: WAITING
>>> Blocked count: 127
>>> Waited count: 123
>>> Waiting on java.util.LinkedList@c33377
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>> 
>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
>>> Thread 31 (TaskLauncher for task):
>>> State: WAITING
>>> Blocked count: 3237
>>> Waited count: 3147
>>> Waiting on java.util.LinkedList@67001629
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>> 
>> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
>>> Thread 30 (Map-events fetcher for all reduce tasks on
>> tracker_node1t:localhost/127.0.0.1:48722):
>>> State: WAITING
>>> Blocked count: 712
>>> Waited count: 804
>>> Waiting on java.util.TreeMap@1fec8cf1
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>> 
>> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606)
>>> Thread 28 (IPC Server handler 7 on 48722):
>>> State: WAITING
>>> Blocked count: 10
>>> Waited count: 3199
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 27 (IPC Server handler 6 on 48722):
>>> State: WAITING
>>> Blocked count: 265
>>> Waited count: 3207
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 26 (IPC Server handler 5 on 48722):
>>> State: WAITING
>>> Blocked count: 296
>>> Waited count: 3206
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 25 (IPC Server handler 4 on 48722):
>>> State: WAITING
>>> Blocked count: 199
>>> Waited count: 3202
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 24 (IPC Server handler 3 on 48722):
>>> State: WAITING
>>> Blocked count: 216
>>> Waited count: 3201
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 23 (IPC Server handler 2 on 48722):
>>> State: WAITING
>>> Blocked count: 119
>>> Waited count: 3199
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 22 (IPC Server handler 1 on 48722):
>>> State: WAITING
>>> Blocked count: 193
>>> Waited count: 3207
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 21 (IPC Server handler 0 on 48722):
>>> State: WAITING
>>> Blocked count: 285
>>> Waited count: 3203
>>> Waiting on
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
>>> Stack:
>>>   sun.misc.Unsafe.park(Native Method)
>>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> 
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>> 
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
>>> Thread 18 (IPC Server listener on 48722):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>>   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
>>>   org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
>>> Thread 20 (IPC Server Responder):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>>   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>   org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
>>> Thread 17 (Timer-0):
>>> State: TIMED_WAITING
>>> Blocked count: 1
>>> Waited count: 12764
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.util.TimerThread.mainLoop(Timer.java:509)
>>>   java.util.TimerThread.run(Timer.java:462)
>>> Thread 16 (295726648@qtp0-0 - Acceptor0
>> SelectChannelConnector@0.0.0.0:50060):
>>> State: RUNNABLE
>>> Blocked count: 1310
>>> Waited count: 1
>>> Stack:
>>>   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>> 
>> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
>>>   org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
>>> 
>> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
>>> 
>> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
>>> 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>>> Thread 4 (Signal Dispatcher):
>>> State: RUNNABLE
>>> Blocked count: 0
>>> Waited count: 0
>>> Stack:
>>> Thread 3 (Finalizer):
>>> State: WAITING
>>> Blocked count: 1337
>>> Waited count: 1336
>>> Waiting on java.lang.ref.ReferenceQueue$Lock@5e77533b
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>   java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>> Thread 2 (Reference Handler):
>>> State: WAITING
>>> Blocked count: 1544
>>> Waited count: 1336
>>> Waiting on java.lang.ref.Reference$Lock@46efbdf1
>>> Stack:
>>>   java.lang.Object.wait(Native Method)
>>>   java.lang.Object.wait(Object.java:485)
>>>   java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>> Thread 1 (main):
>>> State: RUNNABLE
>>> Blocked count: 152594
>>> Waited count: 279661
>>> Stack:
>>>   sun.management.ThreadImpl.getThreadInfo0(Native Method)
>>>   sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
>>>   sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)
>>> 
>> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
>>> 
>> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)
>>> 
>> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323)
>>> 
>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
>>>   org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
>>>   org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
>>> 
>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker: About
>> to purge task: attempt_201004301437_0050_m_000001_0
>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
>> addFreeSlot : current free slots : 1
>>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner:
>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map ID
>> attempt_201004301437_0050_m_000001_0 not found in cache
>>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0
>>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner:
>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
>> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0 task's
>> state:FAILED_UNCLEAN
>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: Trying
>> to launch : attempt_201004301437_0050_m_000001_0
>>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker: In
>> TaskLauncher, current free slots : 1 and trying to launch
>> attempt_201004301437_0050_m_000001_0
>>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker: JVM
>> with ID: jvm_201004301437_0050_m_-1095145752 given task:
>> attempt_201004301437_0050_m_000001_0
>>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201004301437_0050_m_000001_0 0.0%
>>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker:
>> attempt_201004301437_0050_m_000001_0 0.0% cleanup
>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker: Task
>> attempt_201004301437_0050_m_000001_0 is done.
>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
>> reported output size for attempt_201004301437_0050_m_000001_0  was 0
>>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner:
>> attempt_201004301437_0050_m_000001_0 done; removing files.
>>> ----------------------------------------------------------------
>>> 
>>> The syslog of the attempt just looks like:
>>> ----------------------------------------------------------------
>>> 2010-05-04 23:48:03,365 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=MAP, sessionId=
>>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData:
>> Failed to get version from the manifest file of jar
>> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar'
>> : null
>>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask:
>> numReduceTasks: 0
>>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader:
>> Loaded the native-hadoop library
>>> 2010-05-04 23:48:04,873 INFO
>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
>> initialized native-zlib library
>>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool: Got
>> brand-new compressor
>>> 2010-05-05 00:59:08,753 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=CLEANUP, sessionId=
>>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner:
>> Runnning cleanup for the task
>>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner:
>> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the process of
>> commiting
>>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner: Task
>> 'attempt_201004301437_0050_m_000001_0' done.
>>> ----------------------------------------------------------------
>>> 
>>> Any ideas ?
>>> 
>>> best regards
>>> Johannes
>> 
>> 
>> 
>> --
>> /*
>> Joe Stein
>> http://www.linkedin.com/in/charmalloc
>> */
>> 


Mime
View raw message