hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Brooks <i.bro...@sensewhere.com>
Subject Constant high cpu usage on datanode process
Date Wed, 14 May 2014 12:47:32 GMT
Hi,

I have noticed recently that the datanodes on a test HDFS cluster im running are run permemnantly
at 100% cpu usage after a few hours of minimal use and over time switch up to 200% CPU event
when there is almost nothing happening on HDFS. 

Node 1:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                     
                                                                                         
                            
15607 hadoop    20   0 1746m 480m 5380 S 102.2  6.1 190:24.45 java                  

ps -ef |grep 15607

hadoop   15607     1 14 May13 ?        03:12:33 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
-Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs
-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true
-Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-sw-hmaster-001.log
-Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA
-Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS
-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode




Node 6:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                     
                                                                                         
                            
10809 hadoop    20   0 1750m 497m  17m S 200.0  6.3 277:07.34 java                       
       

ps -ef |grep 10809

hadoop   10809     1 99 09:56 ?        04:38:04 /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
-Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs
-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop
-Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true
-Dhadoop.log.dir=/usr/local/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop-hadoop-datanode-sw-hadoop-006.log
-Dhadoop.home.dir=/usr/local/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA
-Djava.library.path=/usr/local/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS
-Dhadoop.security.logger=ERROR,RFAS -Dcom.sun.management.jmxremote.port=1235 -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
-Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode



I though initally it might be GC but monitoring the datanode process via JMX show almost no
GC activity < 1%

If I restart the datanode process then CPU usage drops back doen tot 1-5% CPU for a few hours
then jumps backup to 100% again.

The version of Hadoop i'm using is hadoop-2.3.0 and the main use is for HBase which is also
not handling much traffic on the test platform. There are 8 nodes in the cluster each with
8GB ram, 4GB assigned to HBase, top shows 1.9GB ram free.

Hadoop heap size is set to 1GB with useage varying between 100 and 200MB, VisualVM shows most
of the CPU time to be in

org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService() - 33.5%
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run() - 23.3%
org.apache.hadoop.net.SocketIOWithTimeoujt$SelectorPool.select() - 16.9%
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportReceivedDeletedBlocks() - 12.9%


Any ideas why this would be happening, has anyone else seen this behaviour?

-Ian Brooks


Mime
View raw message