hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Xu <ted.xu...@gmail.com>
Subject Re: MapReduce Child don't exit?
Date Wed, 18 Nov 2009 01:54:00 GMT
Thanks for the reply, that's very helpful.

I think it is a bug for DFSClient.
2009/11/17 Jason Venner <jason.hadoop@gmail.com>

> The dfs client code waits until the all of the datanodes that are going to
> hold a replica of your output's blocks have ack'd.
> If you are pausing there, most likely something is wrong in your hdfs
> cluster.
>
>
> On Thu, Nov 12, 2009 at 7:06 AM, Ted Xu <ted.xu.ml@gmail.com> wrote:
>
>>  hi all,
>>
>> We are using hadoop-0.19.1 on about 200 nodes. We find there are lots of
>> slaves keep Child process even the job is done.
>>
>> Here is an example, the process is running since "AUGEST 09"!
>>
>>
>>> 1000     24625     1  0 Aug09 ?        00:00:38 (...java... classpath)
>>> org.apache.hadoop.mapred.Child 127.0.0.1 55998
>>> attempt_200908081205_0054_r_000093_0 441920924
>>
>>
>> jstack output for the process is:
>>
>>
>>> 2009-11-12 14:58:59
>>> Full thread dump Java HotSpot(TM) Server VM (11.0-b15 mixed mode):
>>>
>>> "Attach Listener" daemon prio=10 tid=0x08168400 nid=0x457a waiting on
>>> condition [0x00000000..0x00000000]
>>>    java.lang.Thread.State: RUNNABLE
>>>
>>> "Thread-2" daemon prio=10 tid=0x08170400 nid=0x60f8 waiting for monitor
>>> entry [0xa33ad000..0xa33adfd0]
>>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3085)
>>>         - waiting to lock <0xa84d12a8> (a
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
>>>         - locked <0xa84cba48> (a
>>> org.apache.hadoop.hdfs.DFSClient$LeaseChecker)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:209)
>>>         - locked <0xa84cba60> (a org.apache.hadoop.hdfs.DFSClient)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:264)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413)
>>>         - locked <0xa84a1e00> (a org.apache.hadoop.fs.FileSystem$Cache)
>>>         at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)
>>>         - locked <0xa84a26f0> (a
>>> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>>>
>>> "SIGTERM handler" daemon prio=10 tid=0x08176800 nid=0x60f6 in
>>> Object.wait() [0xa35ad000..0xa35ae0d0]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0xa84a26f0> (a
>>> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>>>         at java.lang.Thread.join(Thread.java:1143)
>>>         - locked <0xa84a26f0> (a
>>> org.apache.hadoop.fs.FileSystem$ClientFinalizer)
>>>         at java.lang.Thread.join(Thread.java:1196)
>>>         at
>>> java.lang.ApplicationShutdownHooks.run(ApplicationShutdownHooks.java:79)
>>>         at java.lang.Shutdown.runHooks(Shutdown.java:89)
>>>         at java.lang.Shutdown.sequence(Shutdown.java:133)
>>>         at java.lang.Shutdown.exit(Shutdown.java:178)
>>>         - locked <0xa4556020> (a java.lang.Class for java.lang.Shutdown)
>>>         at java.lang.Terminator$1.handle(Terminator.java:35)
>>>         at sun.misc.Signal$1.run(Signal.java:195)
>>>         at java.lang.Thread.run(Thread.java:619)
>>>
>>> "Comm thread for attempt_200908081205_0054_r_000093_0" daemon prio=10
>>> tid=0x083f0000 nid=0x6049 waiting for monitor entry [0xa35fe000..0xa35ff050]
>>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>>         at java.lang.Shutdown.exit(Shutdown.java:178)
>>>         - waiting to lock <0xa4556020> (a java.lang.Class for
>>> java.lang.Shutdown)
>>>         at java.lang.Runtime.exit(Runtime.java:90)
>>>         at java.lang.System.exit(System.java:906)
>>>         at org.apache.hadoop.mapred.Task$1.run(Task.java:430)
>>>         at java.lang.Thread.run(Thread.java:619)
>>>
>>> "Thread for syncLogs" daemon prio=10 tid=0xa39cc800 nid=0x6041 waiting
>>> for monitor entry [0xa38a3000..0xa38a3fd0]
>>>    java.lang.Thread.State: BLOCKED (on object monitor)
>>>         at java.lang.Shutdown.exit(Shutdown.java:178)
>>>         - waiting to lock <0xa4556020> (a java.lang.Class for
>>> java.lang.Shutdown)
>>>         at java.lang.Runtime.exit(Runtime.java:90)
>>>         at java.lang.System.exit(System.java:906)
>>>         at org.apache.hadoop.mapred.Child$1.run(Child.java:84)
>>>
>>> "Low Memory Detector" daemon prio=10 tid=0x0811c800 nid=0x603e runnable
>>> [0x00000000..0x00000000]
>>>    java.lang.Thread.State: RUNNABLE
>>>
>>> "CompilerThread1" daemon prio=10 tid=0x0811a400 nid=0x603d waiting on
>>> condition [0x00000000..0xa3bfe5c8]
>>>    java.lang.Thread.State: RUNNABLE
>>>
>>> "CompilerThread0" daemon prio=10 tid=0x08118000 nid=0x603c waiting on
>>> condition [0x00000000..0xa3df5608]
>>>    java.lang.Thread.State: RUNNABLE
>>>
>>> "Signal Dispatcher" daemon prio=10 tid=0x08116800 nid=0x603b runnable
>>> [0x00000000..0xa3e46d90]
>>>    java.lang.Thread.State: RUNNABLE
>>>
>>> "Finalizer" daemon prio=10 tid=0x08104000 nid=0x603a in Object.wait()
>>> [0xa3e97000..0xa3e97e50]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0xa84887a0> (a java.lang.ref.ReferenceQueue$Lock)
>>>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>         - locked <0xa84887a0> (a java.lang.ref.ReferenceQueue$Lock)
>>>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>         at
>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>
>>> "Reference Handler" daemon prio=10 tid=0x08102800 nid=0x6039 in
>>> Object.wait() [0xa3ee8000..0xa3ee8fd0]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0xa84a93c0> (a java.lang.ref.Reference$Lock)
>>>         at java.lang.Object.wait(Object.java:485)
>>>         at
>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>>         - locked <0xa84a93c0> (a java.lang.ref.Reference$Lock)
>>>
>>> "main" prio=10 tid=0x0805b000 nid=0x6033 in Object.wait()
>>> [0xb7dc6000..0xb7dc7298]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         - waiting on <0xa84cff68> (a java.util.LinkedList)
>>>         at java.lang.Object.wait(Object.java:485)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3025)
>>>         - locked <0xa84cff68> (a java.util.LinkedList)
>>>         - locked <0xa84d12a8> (a
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3105)
>>>         - locked <0xa84d12a8> (a
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>>>         at
>>> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:102)
>>>         - locked <0xa84cffd0> (a
>>> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter)
>>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:158)
>>>
>>> "VM Thread" prio=10 tid=0x080ff000 nid=0x6038 runnable
>>>
>>> "GC task thread#0 (ParallelGC)" prio=10 tid=0x08062400 nid=0x6034
>>> runnable
>>>
>>> "GC task thread#1 (ParallelGC)" prio=10 tid=0x08063800 nid=0x6035
>>> runnable
>>>
>>> "GC task thread#2 (ParallelGC)" prio=10 tid=0x08065000 nid=0x6036
>>> runnable
>>>
>>> "GC task thread#3 (ParallelGC)" prio=10 tid=0x08066400 nid=0x6037
>>> runnable
>>>
>>> "VM Periodic Task Thread" prio=10 tid=0x0811e400 nid=0x603f waiting on
>>> condition
>>>
>>> JNI global references: 738
>>>
>> It seems the process is blocked by DFS client. Anyone tell me how to avoid
>> it?
>>
>> Best Regards,
>>
>> Ted Xu
>>
>
>
>
> --
>  Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>


Best Regards,

Ted Xu

Mime
View raw message