hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George P. Stathis" <gstat...@traackr.com>
Subject Re: High OS Load Numbers when idle
Date Tue, 17 Aug 2010 22:49:05 GMT
Actually, there is nothing in %wa but a ton sitting in %id. This is
from the Master:

top - 18:30:24 up 5 days, 20:10,  1 user,  load average: 2.55, 1.99, 1.25
Tasks:  89 total,   1 running,  88 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.2%st
Mem:  17920228k total,  2795464k used, 15124764k free,   248428k buffers
Swap:        0k total,        0k used,        0k free,  1398388k cached

I have atop installed which is reporting the hadoop/hbase java daemons
as the most active processes (barely taking any CPU time though and
most of the time in sleep mode):

ATOP - domU-12-31-39-18-1 2010/08/17  18:31:46               10 seconds elapsed
PRC | sys   0.01s | user   0.00s | #proc     89 | #zombie    0 | #exit      0 |
CPU | sys      0% | user      0% | irq       0% | idle    200% | wait      0%
|
cpu | sys      0% | user      0% | irq       0% | idle    100% | cpu000 w  0%
|
CPL | avg1   2.55 | avg5    2.12 | avg15   1.35 | csw     2397 | intr    2034 |
MEM | tot   17.1G | free   14.4G | cache   1.3G | buff  242.6M | slab  193.1M |
SWP | tot    0.0M | free    0.0M |              | vmcom   1.6G | vmlim   8.5G
|
NET | transport   | tcpi     330 | tcpo     169 | udpi     566 | udpo     147 |
NET | network     | ipi      896 | ipo      316 | ipfrw      0 | deliv    896
|
NET | eth0   ---- | pcki     777 | pcko     197 | si  248 Kbps | so   70 Kbps |
NET | lo     ---- | pcki     119 | pcko     119 | si    9 Kbps | so    9 Kbps |

  PID  CPU COMMAND-LINE                                                  1/1
17613   0% atop
17150   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
16527   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
16839   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
16735   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
17083   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor

Same with atop:

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
16527 ubuntu    20   0 2352M   98M 10336 S  0.0  0.6  0:42.05
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
-Dhadoop.log.dir=/var/log/h
16735 ubuntu    20   0 2403M 81544 10236 S  0.0  0.5  0:01.56
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
-Dhadoop.log.dir=/var/log/h
17083 ubuntu    20   0 4557M 45388 10912 S  0.0  0.3  0:00.65
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -server -XX:+Heap
    1 root      20   0 23684  1880  1272 S  0.0  0.0  0:00.23 /sbin/init
  587 root      20   0  247M  4088  2432 S  0.0  0.0 -596523h-14:-8
/usr/sbin/console-kit-daemon --no-daemon
 3336 root      20   0 49256  1092   540 S  0.0  0.0  0:00.36 /usr/sbin/sshd
16430 nobody    20   0 34408  3704  1060 S  0.0  0.0  0:00.01 gmond
17150 ubuntu    20   0 2519M  112M 11312 S  0.0  0.6 -596523h-14:-8
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -server -XX


So I'm a bit perplexed. Are there any hadoop / hbase specific tricks
that can reveal what these processes are doing?

-GS



On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>
> It's not normal, but then again I don't have access to your machines
> so I can only speculate.
>
> Does "top" show you which process is in %wa? If so and it's a java
> process, can you figure what's going on in there?
>
> J-D
>
> On Tue, Aug 17, 2010 at 11:03 AM, George Stathis <gstathis@gmail.com> wrote:
> > Hello,
> >
> > We have just setup a new cluster on EC2 using Hadoop 0.20.2 and HBase
> > 0.20.3. Our small setup as of right now consists of one master and four
> > slaves with a replication factor of 2:
> >
> > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1 namenode, 1
> > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper (uses its' own
> > dedicated EMS drive)
> > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run 1 datanode, 1
> > tasktracker, 1 regionserver
> >
> > We have also installed Ganglia to monitor the cluster stats as we are about
> > to run some performance tests but, right out of the box, we are noticing
> > high system loads (especially on the master node) without any activity
> > happening on the clister. Of course, the CPUs are not being utilized at all,
> > but Ganglia is reporting almost all nodes in the red as the 1, 5 an 15
> > minute load times are all above 100% most of the time (i.e. there are more
> > than two processes at a time competing for the 2 CPUs time).
> >
> > Question1: is this normal?
> > Question2: does it matter since each process barely uses any of the CPU
> > time?
> >
> > Thank you in advance and pardon the noob questions.
> >
> > -GS
> >

Mime
View raw message