hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patai Sangbutsarakum <silvianhad...@gmail.com>
Subject Re: secondary namenode is hang at post
Date Wed, 11 Dec 2013 22:49:44 GMT
Thanks adam,

2013-12-11 14:43:07
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.7-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000041057000 nid=0x6b95 waiting
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"DestroyJavaVM" prio=10 tid=0x00007fb6e0009800 nid=0x1b6f waiting on
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"SecondaryNameNode Status
Name Node Address    : pnn/x.x.x.x:9000
Start Time           : Wed Dec 11 13:04:22 PST 2013
Last Checkpoint Time : --
Checkpoint Period    : 3600 seconds
Checkpoint Size      : 67108864 MB
Checkpoint Dirs      : [/hadoop/namesecondary, /hadoop/nfs_namesecondary]
Checkpoint Edits Dirs: [/hadoop/namesecondary, /hadoop/nfs_namesecondary]"
daemon prio=10 tid=0x00007fb6e0471800 nid=0x1b95 runnable
[0x00007fad08f43000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0x00007fb4077b5108> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
- locked <0x00007fb4077a9218> (a
sun.net.www.protocol.http.HttpURLConnection)
at
sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2300)
at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:192)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:404)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:473)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:333)
at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
at java.lang.Thread.run(Thread.java:662)

"Timer-0" daemon prio=10 tid=0x00007fb6e0484000 nid=0x1b94 in Object.wait()
[0x00007fad09044000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007fad3e671138> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:509)
- locked <0x00007fad3e671138> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:462)

"1408732008@qtp-421539177-1 - Acceptor0 SelectChannelConnector@snn:50090"
prio=10 tid=0x00007fb6e0478800 nid=0x1b93 runnable [0x00007fad09145000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x00007fad3e6c8f48> (a sun.nio.ch.Util$2)
- locked <0x00007fad3e6c8f60> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007fad3e6c8ed0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:511)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:193)
at
org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

"162291475@qtp-421539177-0" prio=10 tid=0x00007fb6e0477800 nid=0x1b92 in
Object.wait() [0x00007fad09246000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007fad3e6711a0> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:626)
- locked <0x00007fad3e6711a0> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)

"RMI TCP Accept-0" daemon prio=10 tid=0x00007fb6e0322000 nid=0x1b8e
runnable [0x00007fad09cd1000]
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
- locked <0x00007fad3e671330> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
at java.lang.Thread.run(Thread.java:662)

"Low Memory Detector" daemon prio=10 tid=0x00007fb6e00b8800 nid=0x1b8d
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007fb6e00b6800 nid=0x1b8c
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007fb6e00b3800 nid=0x1b8b
waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007fb6e00b1800 nid=0x1b8a
runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007fb6e0094800 nid=0x1b89 in
Object.wait() [0x00007fad0aa8d000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007fad3e671898> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x00007fad3e671898> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007fb6e0092800 nid=0x1b88 in
Object.wait() [0x00007fad0ab8e000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00007fad3e669458> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00007fad3e669458> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x00007fb6e008c000 nid=0x1b87 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fb6e001c800 nid=0x1b70
runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fb6e001e800 nid=0x1b71
runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fb6e0020000 nid=0x1b72
runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fb6e0022000 nid=0x1b73
runnable

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fb6e0024000 nid=0x1b74
runnable

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fb6e0025800 nid=0x1b75
runnable

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fb6e0027800 nid=0x1b76
runnable

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fb6e0029800 nid=0x1b77
runnable

"GC task thread#8 (ParallelGC)" prio=10 tid=0x00007fb6e002b000 nid=0x1b78
runnable

"GC task thread#9 (ParallelGC)" prio=10 tid=0x00007fb6e002d000 nid=0x1b79
runnable

"GC task thread#10 (ParallelGC)" prio=10 tid=0x00007fb6e002f000 nid=0x1b7a
runnable

"GC task thread#11 (ParallelGC)" prio=10 tid=0x00007fb6e0030800 nid=0x1b7b
runnable

"GC task thread#12 (ParallelGC)" prio=10 tid=0x00007fb6e0032800 nid=0x1b7c
runnable

"GC task thread#13 (ParallelGC)" prio=10 tid=0x00007fb6e0034800 nid=0x1b7d
runnable

"GC task thread#14 (ParallelGC)" prio=10 tid=0x00007fb6e0036000 nid=0x1b7e
runnable

"GC task thread#15 (ParallelGC)" prio=10 tid=0x00007fb6e0038000 nid=0x1b7f
runnable

"GC task thread#16 (ParallelGC)" prio=10 tid=0x00007fb6e003a000 nid=0x1b80
runnable

"GC task thread#17 (ParallelGC)" prio=10 tid=0x00007fb6e003b800 nid=0x1b81
runnable

"GC task thread#18 (ParallelGC)" prio=10 tid=0x00007fb6e003d800 nid=0x1b82
runnable

"GC task thread#19 (ParallelGC)" prio=10 tid=0x00007fb6e003f800 nid=0x1b83
runnable

"GC task thread#20 (ParallelGC)" prio=10 tid=0x00007fb6e0041000 nid=0x1b84
runnable

"GC task thread#21 (ParallelGC)" prio=10 tid=0x00007fb6e0043000 nid=0x1b85
runnable

"GC task thread#22 (ParallelGC)" prio=10 tid=0x00007fb6e0045000 nid=0x1b86
runnable

"VM Periodic Task Thread" prio=10 tid=0x00007fb6e0335800 nid=0x1b8f waiting
on condition

JNI global references: 1363



On Wed, Dec 11, 2013 at 2:40 PM, Adam Kawa <kawa.adam@gmail.com> wrote:

> It looks that it can not copy the new checkpoint into the NameNode. Can
> you copy-past what jstack says?
>
> $ sudo -u hdfs jstack <snn-pid>
>
>
>
> 2013/12/11 Patai Sangbutsarakum <silvianhadoop@gmail.com>
>
>> It just happens without changing anything in the cluster. Secondary
>> namenode node has been working fine until today i notice that in second
>> namenode log file stop at.
>>
>> 2013-12-11 13:17:41,083 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 3941631662
>> saved in 61 seconds.
>> 2013-12-11 13:19:15,446 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 3941631662
>> saved in 94 seconds.
>> 2013-12-11 13:19:29,760 INFO
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>> primarynn:50070putimage=1&port=50090&machine=snn2&token=-19:1944437056:1306462484827:1386562944000:1386559939508&newChecksum=2ad3e008becbd2cd5196b2b6ba1498ac
>>
>>
>> #date
>> Wed Dec 11 14:16:30 PST 2013
>>
>> even after 1 hour passed it's not finish doing the checkpoint. looking at
>> timestamp of current/fsimage.ckpt at primary namenode; it doesn't show
>> progress in size and timestamp of the file is days ago.
>>
>> already tried to clean the current in snn and restart secondarynamenode
>> process, but SNN still stop at the same spot even thought the snn process
>> is still exist.
>>
>>
>> -P
>>
>
>

Mime
View raw message