hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Smith (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1374) TaskTracker falls into an infinite loop.
Date Sat, 02 Jun 2007 18:45:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500960
] 

Mike Smith commented on HADOOP-1374:
------------------------------------

As Kostantin mentioed, all maps are done and only the last reducer stucks and task will not
fail after 10 min, because jobtracker still gets the heartbeats. By the way, here is the core
dump when the reducers stucks and the whole cluster runs in linux:

Full thread dump Java HotSpot(TM) Client VM (1.5.0_11-b03 mixed mode):

"IPC Client connection to jobtracker_address/jobtracker_address:50002" daemon prio=1 tid=0x096f29d8
nid=0x76cc in Object.wait() [0x4cfdc000..0x4cfdcf40]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:213)
        - locked <0x4f13b5c0> (a org.apache.hadoop.ipc.Client$Connection)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:252)

"SocketListener0-19" prio=1 tid=0x0a61ccf8 nid=0x49ed in Object.wait() [0x4b08a000..0x4b08af40]
        at java.lang.Object.wait(Native Method)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:522)
        - locked <0x567ba450> (a org.mortbay.util.ThreadPool$PoolThread)

"Map output copy reporter for task task_0023_r_000085_0" daemon prio=1 tid=0x0a3c9ca8 nid=0x49c2
waiting on condition [0x4b28e000..0x4b28f040]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapred.ReduceTaskRunner$PingTimer.run(ReduceTaskRunner.java:189)

"MapOutputCopier task_0023_r_000085_0.294" prio=1 tid=0x0971a398 nid=0x49c0 in Object.wait()
[0x4af07000..0x4af07f40]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:267)
        - locked <0x567557b8> (a java.util.ArrayList)

"MapOutputCopier task_0023_r_000085_0.292" prio=1 tid=0x096ff088 nid=0x49be in Object.wait()
[0x4af88000..0x4af88e40]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:267)
        - locked <0x567557b8> (a java.util.ArrayList)

"MapOutputCopier task_0023_r_000085_0.290" prio=1 tid=0x09bd99f8 nid=0x49bc in Object.wait()
[0x4c1ad000..0x4c1ad140]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:267)
        - locked <0x567557b8> (a java.util.ArrayList)

"MapOutputCopier task_0023_r_000085_0.288" prio=1 tid=0x0a314ce0 nid=0x49ba in Object.wait()
[0x4bca2000..0x4bca3040]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:267)
        - locked <0x567557b8> (a java.util.ArrayList)

"MapOutputCopier task_0023_r_000085_0.286" prio=1 tid=0x093d2c18 nid=0x49b8 in Object.wait()
[0x4d15f000..0x4d15ff40]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:267)
        - locked <0x567557b8> (a java.util.ArrayList)

"Thread-27723" prio=1 tid=0x093d29a8 nid=0x49b7 waiting on condition [0x4b99c000..0x4b99cec0]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapred.ReduceTaskRunner.prepare(ReduceTaskRunner.java:572)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:143)

"MultiThreadedHttpConnectionManager cleanup" daemon prio=1 tid=0x0860d368 nid=0x7e66 in Object.wait()
[0x4d1e0000..0x4d1e1040]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x56418ca8> (a java.lang.ref.ReferenceQueue$Lock)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1082)

"org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=1 tid=0x085c4fb0 nid=0x7e64
waiting on condition [0x4d2e2000..0x4d2e2f40]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:401)

"IPC Server handler 1 on 50050" daemon prio=1 tid=0x085bbaf0 nid=0x7e63 in Object.wait() [0x4d363000..0x4d363ec0]
        at java.lang.Object.wait(Native Method)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:533)
        - locked <0x5640dcd8> (a java.util.LinkedList)

"IPC Server handler 0 on 50050" daemon prio=1 tid=0x085c3fc8 nid=0x7e62 in Object.wait() [0x4d3e4000..0x4d3e4e40]
        at java.lang.Object.wait(Native Method)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:533)
        - locked <0x5640dcd8> (a java.util.LinkedList)

"IPC Server listener on 50050" daemon prio=1 tid=0x085c3ab0 nid=0x7e61 runnable [0x4d465000..0x4d4661c0]
        at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
        at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
        at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x5640de48> (a sun.nio.ch.Util$1)
        - locked <0x5640de58> (a java.util.Collections$UnmodifiableSet)
        - locked <0x5640ddf8> (a sun.nio.ch.PollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:253)


"Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=50060]" prio=1 tid=0x0859f278
nid=0x7e5f runnable [0x4d56f000..0x4d5700c0]
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
        - locked <0x564028b8> (a java.net.SocksSocketImpl)
        at java.net.ServerSocket.implAccept(ServerSocket.java:450)
        at java.net.ServerSocket.accept(ServerSocket.java:421)
        at org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432)
        at org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631)

"SessionScavenger" daemon prio=1 tid=0x08599fc8 nid=0x7e5e waiting on condition [0x4d5f0000..0x4d5f1040]
        at java.lang.Thread.sleep(Native Method)
        at org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run(AbstractSessionManager.java:587)

"taskCleanup" daemon prio=1 tid=0x082f4f60 nid=0x7e5b waiting on condition [0x4d701000..0x4d701fc0]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1767)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359)
        at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:181)
        at java.lang.Thread.run(Thread.java:595)

"Low Memory Detector" daemon prio=1 tid=0x080c99b8 nid=0x7e59 runnable [0x00000000..0x00000000]

"CompilerThread0" daemon prio=1 tid=0x080c8408 nid=0x7e58 waiting on condition [0x00000000..0x4e76b828]

"Signal Dispatcher" daemon prio=1 tid=0x080c7580 nid=0x7e57 waiting on condition [0x00000000..0x00000000]

"Finalizer" daemon prio=1 tid=0x080bcf00 nid=0x7e53 in Object.wait() [0x4e86e000..0x4e86e140]
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x562d2798> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x080bc238 nid=0x7e52 in Object.wait() [0x4e8ee000..0x4e8ef0c0]
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:474)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x562d2818> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x0805dc50 nid=0x7e3b in Object.wait() [0xbfc72000..0xbfc725c8]
        at java.lang.Object.wait(Native Method)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:509)
        - locked <0x56311618> (a [I)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:899)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1589)

"VM Thread" prio=1 tid=0x080b9540 nid=0x7e51 runnable

"VM Periodic Task Thread" prio=1 tid=0x080cae98 nid=0x7e5a waiting on condition






> TaskTracker falls into an infinite loop.
> ----------------------------------------
>
>                 Key: HADOOP-1374
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1374
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Konstantin Shvachko
>            Assignee: Arun C Murthy
>             Fix For: 0.14.0
>
>         Attachments: DataNode1.log, DataNode2.log, JobTracker.log, NameNode.log, TaskTracker1.log,
TaskTracker2.log, TestDFSIO.log
>
>
> All maps had been completed successfully. I had only one reduce task during which
> TaskTracker infinitely outputs:
> 07/05/15 19:35:41 INFO mapred.TaskTracker: task_0001_r_000000_0 0.16666667% reduce >
copy (4 of 8 at 0.00 MB/s) > 
> 07/05/15 19:35:42 INFO mapred.TaskTracker: task_0001_r_000000_0 0.16666667% reduce >
copy (4 of 8 at 0.00 MB/s) > 
> 07/05/15 19:35:43 INFO mapred.TaskTracker: task_0001_r_000000_0 0.16666667% reduce >
copy (4 of 8 at 0.00 MB/s) > 
> 07/05/15 19:35:44 INFO mapred.TaskTracker: task_0001_r_000000_0 0.16666667% reduce >
copy (4 of 8 at 0.00 MB/s) > 
> 07/05/15 19:35:45 INFO mapred.TaskTracker: task_0001_r_000000_0 0.16666667% reduce >
copy (4 of 8 at 0.00 MB/s) > 
> JobTracker does not log anything about task task_0001_r_000000_0 except for
> 07/05/15 19:49:01 INFO mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000,
for tracker 'tracker_my-host.com:50050'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message