hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghava Mutharaju <m.vijayaragh...@gmail.com>
Subject Re: hanging task
Date Sat, 08 May 2010 06:34:37 GMT
Hi Mike,

       I think GC stands for Garbage Collection right? How can one look at
GC?

Regards,
Raghava.

On Fri, May 7, 2010 at 12:35 PM, Michael Segel <michael_segel@hotmail.com>wrote:

>
> There's a couple of things...
>
> 1) Inside the map() method, if you're taking longer than the timeout, you
> will fail.
> 2) You could also fail if you run in to GC problems that cause timeout
> failures.
>
> What sometimes happens in long running jobs, you tend to see your GC take
> longer as the job runs.
>
> Do you log a start and end time in each iteration of map()? This will tell
> you if the individual time spent in a map() method call is too long.
>
> Also look at GC. Trying to go through all the logs can really drive one to
> drink. :-)
>
> HTH
>
> -Mike
>
>
> > Subject: Re: hanging task
> > From: jzillmann@googlemail.com
> > Date: Fri, 7 May 2010 17:59:01 +0200
> > To: common-user@hadoop.apache.org
> >
> > Sorry, forgot to say that on map task runs about more then one hour.
> > So i do not think its the progress thing. So there are successful map
> tasks and those which are failing failing long after the 10 min timeout.
> > And the question is why there is not any stack of the map task visibly in
> the thread dump.
> > Is it swallowed or is hadoop stuck right after the mapper is done ?
> >
> > Johannes
> >
> >
> > On May 7, 2010, at 5:16 PM, Raghava Mutharaju wrote:
> >
> > > Hello Johannes,
> > >
> > > I had a similar problem and I used the first approach suggested by
> Joseph
> > > i.e. to report the status back. I used progress() method as well as the
> > > setStatus() method. progress() method javadoc link is given below.
> > >
> > >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Progressable.html#progress%28%29
> > >
> > > This should solve your problem. If you come to know of the reason why
> it is
> > > taking more time than you expected, please mention it here.
> > >
> > > Regards,
> > > Raghava.
> > >
> > > On Fri, May 7, 2010 at 10:31 AM, Joseph Stein <cryptcom@gmail.com>
> wrote:
> > >
> > >> You need to either report status or increment a counter from within
> > >> your task. In your Java code there is a little trick to help the job
> > >> be “aware” within the cluster of tasks that are not dead but just
> > >> working hard.  During execution of a task there is no built in
> > >> reporting that the job is running as expected if it is not writing
> > >> out.  So this means that if your tasks are taking up a lot of time
> > >> doing work it is possible the cluster will see that task as failed
> > >> (based on the mapred.task.tracker.expiry.interval setting).
> > >>
> > >> Have no fear there is a way to tell cluster that your task is doing
> > >> just fine.  You have 2 ways todo this you can either report the status
> > >> or increment a counter.  Both of these will cause the task tracker to
> > >> properly know the task is ok and this will get seen by the jobtracker
> > >> in turn.  Both of these options are explained in the JavaDoc
> > >>
> > >>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html
> > >>
> > >> Here are some pointers in general you may find useful
> > >>
> > >>
> http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
> > >>
> > >> On Fri, May 7, 2010 at 10:26 AM, Johannes Zillmann
> > >> <jzillmann@googlemail.com> wrote:
> > >>> Hi hadoop folks,
> > >>>
> > >>> i'm encountering following problem on a 4 node cluster running
> > >> hadoop-0.20.2.
> > >>>
> > >>> Its a map only job reading about 9 GB data from outside of hadoop.
31
> map
> > >> tasks at all while 12 map tasks running at a time.
> > >>> The first wave of mappers finishes successfully.
> > >>> Later on the first tasks are failing and they do that shortly before
> > >> finishing:
> > >>> Task attempt_201005061210_0002_m_000001_0 failed to report status for
> 602
> > >> seconds. Killing!
> > >>> Task attempt_201005061210_0002_m_000001_1 failed to report status for
> 600
> > >> seconds. Killing!
> > >>> Task attempt_201005061210_0002_m_000001_2 failed to report status for
> 602
> > >> seconds. Killing!
> > >>>
> > >>> The unusual is that i do not find any signs of the job code in the
> thread
> > >> dump the tasktracker takes automatically:
> > >>> ----------------------------------------------------------------
> > >>> 2010-05-05 00:59:03,515 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> attempt_201004301437_0050_m_000001_0: Task
> > >> attempt_201004301437_0050_m_000001_0 failed to report status for 601
> > >> seconds. Killing!
> > >>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> Process Thread Dump: lost task
> > >>> 34 active threads
> > >>> Thread 29555 (process reaper):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
> > >>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> > >>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> > >>> Thread 29554 (JVM Runner jvm_201004301437_0050_m_1465855495
> spawned.):
> > >>> State: WAITING
> > >>> Blocked count: 1
> > >>> Waited count: 2
> > >>> Waiting on java.lang.UNIXProcess@34b56f69
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> > >>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> > >>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
> > >>>
> > >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> > >>> Thread 29550 (Thread-15193):
> > >>> State: WAITING
> > >>> Blocked count: 13
> > >>> Waited count: 14
> > >>> Waiting on java.lang.Object@72c09161
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> > >>> Thread 29501 (process reaper):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
> > >>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> > >>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> > >>> Thread 29500 (JVM Runner jvm_201004301437_0050_m_861348535 spawned.):
> > >>> State: WAITING
> > >>> Blocked count: 1
> > >>> Waited count: 2
> > >>> Waiting on java.lang.UNIXProcess@56175cd4
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> > >>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> > >>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
> > >>>
> > >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> > >>> Thread 29496 (Thread-15162):
> > >>> State: WAITING
> > >>> Blocked count: 13
> > >>> Waited count: 14
> > >>> Waiting on java.lang.Object@3b916de2
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> > >>> Thread 29225 (process reaper):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
> > >>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> > >>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> > >>> Thread 29224 (JVM Runner jvm_201004301437_0050_m_-205889314
> spawned.):
> > >>> State: WAITING
> > >>> Blocked count: 1
> > >>> Waited count: 2
> > >>> Waiting on java.lang.UNIXProcess@39874d3b
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> > >>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> > >>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
> > >>>
> > >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> > >>> Thread 29217 (process reaper):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>>   java.lang.UNIXProcess.waitForProcessExit(Native Method)
> > >>>   java.lang.UNIXProcess.access$900(UNIXProcess.java:20)
> > >>>   java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132)
> > >>> Thread 29215 (JVM Runner jvm_201004301437_0050_m_-1797469636
> spawned.):
> > >>> State: WAITING
> > >>> Blocked count: 1
> > >>> Waited count: 2
> > >>> Waiting on java.lang.UNIXProcess@2c39220f
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
> > >>>   org.apache.hadoop.util.Shell.runCommand(Shell.java:186)
> > >>>   org.apache.hadoop.util.Shell.run(Shell.java:134)
> > >>>
> > >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:335)
> > >>>
> > >>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:324)
> > >>> Thread 29208 (Thread-15014):
> > >>> State: WAITING
> > >>> Blocked count: 18
> > >>> Waited count: 14
> > >>> Waiting on java.lang.Object@7ddbbae2
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> > >>> Thread 29203 (Thread-15011):
> > >>> State: WAITING
> > >>> Blocked count: 13
> > >>> Waited count: 14
> > >>> Waiting on java.lang.Object@2dac3f6f
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409)
> > >>> Thread 26714 (IPC Client (47) connection to /10.0.11.64:9001 from
> > >> hadoop):
> > >>> State: TIMED_WAITING
> > >>> Blocked count: 10075
> > >>> Waited count: 10075
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>
> org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403)
> > >>>   org.apache.hadoop.ipc.Client$Connection.run(Client.java:445)
> > >>> Thread 33 (Directory/File cleanup thread):
> > >>> State: WAITING
> > >>> Blocked count: 2
> > >>> Waited count: 615
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4d6c68b3
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>
> > >>
> org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.java:89)
> > >>> Thread 9 (taskCleanup):
> > >>> State: WAITING
> > >>> Blocked count: 30
> > >>> Waited count: 92
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3298407f
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:317)
> > >>>   java.lang.Thread.run(Thread.java:619)
> > >>> Thread 32 (TaskLauncher for task):
> > >>> State: WAITING
> > >>> Blocked count: 127
> > >>> Waited count: 123
> > >>> Waiting on java.util.LinkedList@c33377
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>
> > >>
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
> > >>> Thread 31 (TaskLauncher for task):
> > >>> State: WAITING
> > >>> Blocked count: 3237
> > >>> Waited count: 3147
> > >>> Waiting on java.util.LinkedList@67001629
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>
> > >>
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1601)
> > >>> Thread 30 (Map-events fetcher for all reduce tasks on
> > >> tracker_node1t:localhost/127.0.0.1:48722):
> > >>> State: WAITING
> > >>> Blocked count: 712
> > >>> Waited count: 804
> > >>> Waiting on java.util.TreeMap@1fec8cf1
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>
> > >>
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:606)
> > >>> Thread 28 (IPC Server handler 7 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 10
> > >>> Waited count: 3199
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 27 (IPC Server handler 6 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 265
> > >>> Waited count: 3207
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 26 (IPC Server handler 5 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 296
> > >>> Waited count: 3206
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 25 (IPC Server handler 4 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 199
> > >>> Waited count: 3202
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 24 (IPC Server handler 3 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 216
> > >>> Waited count: 3201
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 23 (IPC Server handler 2 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 119
> > >>> Waited count: 3199
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 22 (IPC Server handler 1 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 193
> > >>> Waited count: 3207
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 21 (IPC Server handler 0 on 48722):
> > >>> State: WAITING
> > >>> Blocked count: 285
> > >>> Waited count: 3203
> > >>> Waiting on
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@b4848ae
> > >>> Stack:
> > >>>   sun.misc.Unsafe.park(Native Method)
> > >>>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> > >>>
> > >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> > >>>
> > >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> > >>>   org.apache.hadoop.ipc.Server$Handler.run(Server.java:939)
> > >>> Thread 18 (IPC Server listener on 48722):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>>   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> > >>>   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> > >>>   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> > >>>   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> > >>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> > >>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
> > >>>   org.apache.hadoop.ipc.Server$Listener.run(Server.java:318)
> > >>> Thread 20 (IPC Server Responder):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>>   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> > >>>   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> > >>>   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> > >>>   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> > >>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> > >>>   org.apache.hadoop.ipc.Server$Responder.run(Server.java:478)
> > >>> Thread 17 (Timer-0):
> > >>> State: TIMED_WAITING
> > >>> Blocked count: 1
> > >>> Waited count: 12764
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.util.TimerThread.mainLoop(Timer.java:509)
> > >>>   java.util.TimerThread.run(Timer.java:462)
> > >>> Thread 16 (295726648@qtp0-0 - Acceptor0
> > >> SelectChannelConnector@0.0.0.0:50060):
> > >>> State: RUNNABLE
> > >>> Blocked count: 1310
> > >>> Waited count: 1
> > >>> Stack:
> > >>>   sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> > >>>   sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> > >>>   sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> > >>>   sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> > >>>   sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> > >>>
> > >>
> org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:429)
> > >>>
> org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185)
> > >>>
> > >>
> org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
> > >>>
> > >>
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707)
> > >>>
> > >>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> > >>> Thread 4 (Signal Dispatcher):
> > >>> State: RUNNABLE
> > >>> Blocked count: 0
> > >>> Waited count: 0
> > >>> Stack:
> > >>> Thread 3 (Finalizer):
> > >>> State: WAITING
> > >>> Blocked count: 1337
> > >>> Waited count: 1336
> > >>> Waiting on java.lang.ref.ReferenceQueue$Lock@5e77533b
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
> > >>>   java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
> > >>>   java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> > >>> Thread 2 (Reference Handler):
> > >>> State: WAITING
> > >>> Blocked count: 1544
> > >>> Waited count: 1336
> > >>> Waiting on java.lang.ref.Reference$Lock@46efbdf1
> > >>> Stack:
> > >>>   java.lang.Object.wait(Native Method)
> > >>>   java.lang.Object.wait(Object.java:485)
> > >>>   java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> > >>> Thread 1 (main):
> > >>> State: RUNNABLE
> > >>> Blocked count: 152594
> > >>> Waited count: 279661
> > >>> Stack:
> > >>>   sun.management.ThreadImpl.getThreadInfo0(Native Method)
> > >>>   sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147)
> > >>>   sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123)
> > >>>
> > >>
> org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
> > >>>
> > >>
> org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:203)
> > >>>
> > >>
> org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:1323)
> > >>>
> > >>
> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
> > >>>   org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
> > >>>   org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
> > >>>
> > >>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
> About
> > >> to purge task: attempt_201004301437_0050_m_000001_0
> > >>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> addFreeSlot : current free slots : 1
> > >>> 2010-05-05 00:59:03,520 INFO org.apache.hadoop.mapred.TaskRunner:
> > >> attempt_201004301437_0050_m_000001_0 done; removing files.
> > >>> 2010-05-05 00:59:03,521 INFO org.apache.hadoop.mapred.IndexCache: Map
> ID
> > >> attempt_201004301437_0050_m_000001_0 not found in cache
> > >>> 2010-05-05 00:59:05,012 INFO org.apache.hadoop.mapred.JvmManager: JVM
> :
> > >> jvm_201004301437_0050_m_-205889314 exited. Number of tasks it ran: 0
> > >>> 2010-05-05 00:59:06,521 INFO org.apache.hadoop.mapred.TaskRunner:
> > >> attempt_201004301437_0050_m_000001_0 done; removing files.
> > >>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> LaunchTaskAction (registerTask): attempt_201004301437_0050_m_000001_0
> task's
> > >> state:FAILED_UNCLEAN
> > >>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
> Trying
> > >> to launch : attempt_201004301437_0050_m_000001_0
> > >>> 2010-05-05 00:59:06,637 INFO org.apache.hadoop.mapred.TaskTracker:
In
> > >> TaskLauncher, current free slots : 1 and trying to launch
> > >> attempt_201004301437_0050_m_000001_0
> > >>> 2010-05-05 00:59:08,606 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM
> > >> with ID: jvm_201004301437_0050_m_-1095145752 given task:
> > >> attempt_201004301437_0050_m_000001_0
> > >>> 2010-05-05 00:59:09,222 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> attempt_201004301437_0050_m_000001_0 0.0%
> > >>> 2010-05-05 00:59:09,560 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> attempt_201004301437_0050_m_000001_0 0.0% cleanup
> > >>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
> Task
> > >> attempt_201004301437_0050_m_000001_0 is done.
> > >>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskTracker:
> > >> reported output size for attempt_201004301437_0050_m_000001_0  was 0
> > >>> 2010-05-05 00:59:09,561 INFO org.apache.hadoop.mapred.TaskRunner:
> > >> attempt_201004301437_0050_m_000001_0 done; removing files.
> > >>> ----------------------------------------------------------------
> > >>>
> > >>> The syslog of the attempt just looks like:
> > >>> ----------------------------------------------------------------
> > >>> 2010-05-04 23:48:03,365 INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics:
> > >> Initializing JVM Metrics with processName=MAP, sessionId=
> > >>> 2010-05-04 23:48:03,674 WARN datameer.dap.sdk.util.ManifestMetaData:
> > >> Failed to get version from the manifest file of jar
> > >>
> 'file:/data/drive0/mapred/tmp/taskTracker/archive/nurago64.local/das/jobjars/61a2133b7887b9be48d42e49002c85e0/stripped-dap-0.24.dev-job.jar/stripped-dap-0.24.dev-job.jar'
> > >> : null
> > >>> 2010-05-04 23:48:04,691 INFO org.apache.hadoop.mapred.MapTask:
> > >> numReduceTasks: 0
> > >>> 2010-05-04 23:48:04,872 INFO org.apache.hadoop.util.NativeCodeLoader:
> > >> Loaded the native-hadoop library
> > >>> 2010-05-04 23:48:04,873 INFO
> > >> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
> > >> initialized native-zlib library
> > >>> 2010-05-04 23:48:04,874 INFO org.apache.hadoop.io.compress.CodecPool:
> Got
> > >> brand-new compressor
> > >>> 2010-05-05 00:59:08,753 INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics:
> > >> Initializing JVM Metrics with processName=CLEANUP, sessionId=
> > >>> 2010-05-05 00:59:09,223 INFO org.apache.hadoop.mapred.TaskRunner:
> > >> Runnning cleanup for the task
> > >>> 2010-05-05 00:59:09,442 INFO org.apache.hadoop.mapred.TaskRunner:
> > >> Task:attempt_201004301437_0050_m_000001_0 is done. And is in the
> process of
> > >> commiting
> > >>> 2010-05-05 00:59:09,562 INFO org.apache.hadoop.mapred.TaskRunner:
> Task
> > >> 'attempt_201004301437_0050_m_000001_0' done.
> > >>> ----------------------------------------------------------------
> > >>>
> > >>> Any ideas ?
> > >>>
> > >>> best regards
> > >>> Johannes
> > >>
> > >>
> > >>
> > >> --
> > >> /*
> > >> Joe Stein
> > >> http://www.linkedin.com/in/charmalloc
> > >> */
> > >>
> >
>
> _________________________________________________________________
> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message