hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Brooks <i.bro...@sensewhere.com>
Subject Re: 100% CPU utilization on idle HDFS Data Nodes
Date Tue, 03 Jun 2014 16:44:44 GMT
Hi Shayan,

If you restart one of the datanodes, does that node go back to normal cpu usage? if so that
looks like the same issue im seeing on my nodes, though mine will go to 200% over time on
a 4 cpu host.

I havent been able to track the cause down yet. Heavy use of HDFS will cause the node to jump
to the 100% sooner and it stays there even when doing very lilttle.



Ted, logs from one of my nodes to go with Shayan's

2014-06-03 17:01:28,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
BP-2121456822-10.143.38.149-1396953188241:blk_1074075941_335194, type=LAST_IN_PIPELINE, downstreams=0:[]
terminating
2014-06-03 17:06:41,860 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
Scheduling blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
for deletion
2014-06-03 17:06:41,871 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
Deleted BP-2121456822-10.143.38.149-1396953188241 blk_1074074152_333405 file /home/hadoop/hdfs/datanode/current/BP-2121456822-10.143.38.149-1396953188241/current/finalized/subdir12/blk_1074074152
2014-06-03 17:08:32,843 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Got a command
from standby NN - ignoring command:2
2014-06-03 17:13:44,320 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075943_335196
src: /10.143.38.100:26618 dest: /10.143.38.100:50010
2014-06-03 17:13:44,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.143.38.100:25817, dest: /10.143.38.100:50010, bytes: 863, op: HDFS_WRITE, cliID:
DFSClient_hb_rs_sw-hmaster-002,16020,1401717607010_805952715_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164,
blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, duration: 3600148963947
2014-06-03 17:13:44,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
BP-2121456822-10.143.38.149-1396953188241:blk_1074075489_334742, type=HAS_DOWNSTREAM_IN_PIPELINE
terminating
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.143.38.104:21368, dest: /10.143.38.100:50010, bytes: 227791, op: HDFS_WRITE, cliID:
DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_-466626109_28, offset: 0, srvID: 62f7174f-a44d-4cc1-b62c-095782a86164,
blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, duration: 3600133166499
2014-06-03 17:13:54,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
BP-2121456822-10.143.38.149-1396953188241:blk_1074075491_334744, type=LAST_IN_PIPELINE, downstreams=0:[]
terminating
2014-06-03 17:36:17,405 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074075950_335203
src: /10.143.38.149:48959 dest: /10.143.38.100:50010


top shows
top - 17:41:55 up 18 days, 23:59,  2 users,  load average: 1.06, 1.04, 0.93
Tasks: 139 total,   1 running, 137 sleeping,   1 stopped,   0 zombie
Cpu(s): 16.4%us, 25.9%sy,  0.0%ni, 56.2%id,  0.0%wa,  0.0%hi,  0.0%si,  1.4%st
Mem:   8059432k total,  5870572k used,  2188860k free,   181076k buffers
Swap:   835576k total,        0k used,   835576k free,  2493828k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                     
                                                                                         
                                                         
25450 hadoop    20   0 1728m 332m  17m S 95.8  4.2  49:24.47 java                        
                                                                                         
                                                         
25147 hbase     20   0 4998m 284m  15m S 13.4  3.6   7:28.30 java                        
                                                                                         
                                                         
 7212 flume     20   0 4558m 399m  20m S  1.9  5.1   0:25.61 java                        
                                   

jstack fails, im probably missing something, just not sure what

 sudo jstack -J-d64 -m 25450
Attaching to process ID 25450, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.51-b03
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
        at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: java.lang.RuntimeException: Unable to deduce type of thread from address 0x00007f0104010800
(expected type JavaThread, CompilerThread, ServiceThread, JvmtiAgentThread, or SurrogateLockerThread)
        at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:162)
        at sun.jvm.hotspot.runtime.Threads.first(Threads.java:150)
        at sun.jvm.hotspot.tools.PStack.initJFrameCache(PStack.java:216)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:67)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:54)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:49)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
        ... 6 more
Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address
0x00007f0104010800
        at sun.jvm.hotspot.runtime.InstanceConstructor.newWrongTypeException(InstanceConstructor.java:62)
        at sun.jvm.hotspot.runtime.VirtualConstructor.instantiateWrapperFor(VirtualConstructor.java:80)
        at sun.jvm.hotspot.runtime.Threads.createJavaThreadWrapper(Threads.java:158)
        ... 14 more


-Ian Brooks


On Tuesday 03 Jun 2014 12:34:36 Shayan Pooya wrote:
> I have a three node HDFS cluster with a name-node.  There is absolutely no
> IO going on this cluster or any jobs running on it and I just use it for
> testing the Disco HDFS integration.
> I noticed that two of the three data-nodes are using 100% CPU.  They have
> been running for a long time (2 months) but with no jobs running on them
> and almost no usage:
> 
> $ hadoop version
> Hadoop 2.3.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
> Compiled by jenkins on 2014-02-11T13:40Z
> Compiled with protoc 2.5.0
> From source with checksum dfe46336fbc6a044bc124392ec06b85
> 
> Is this a known bug?
-- 
-Ian Brooks
Senior server administrator - Sensewhere

Mime
View raw message