Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 45707 invoked from network); 17 Aug 2010 22:49:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Aug 2010 22:49:54 -0000 Received: (qmail 43960 invoked by uid 500); 17 Aug 2010 22:49:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 43908 invoked by uid 500); 17 Aug 2010 22:49:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 43900 invoked by uid 99); 17 Aug 2010 22:49:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Aug 2010 22:49:53 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gstathis@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-wy0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Aug 2010 22:49:45 +0000 Received: by wyg36 with SMTP id 36so8758796wyg.14 for ; Tue, 17 Aug 2010 15:49:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:content-type:content-transfer-encoding; bh=W7Q3itvQyqhp0zRO80RvFmWfRk3O/slVhvhvciLGJQg=; b=E9kg+jb30FVlBNaVC17pSFO8mRj227lM8osbCpBMyXK441Jyf3ZImh5qjt/d7ke09M F0tdTJr/D2dLw6TSYcppK8UE8nCD2njNND4jVr72R7O0ABgd+VF9ANYoxHG3pyLSYIyq eH8mmJyGMJXCxOhMIA/gZR7j98Foh05ulKXM0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type :content-transfer-encoding; b=EagN+nid1jmoDNJ15Snbwn5z6geaU5A8P94pkdQ8Q7qlONNek758YTVMtyfUz1mg4e At6Jz7wUVBTJ2dErDG6pn9QeGE+vzNg4FRzIIUEOEGF74EfD9CW1WuVhwHBCFknH1b4x jTRPP/mplYuCAsm5juXynfkEdK/csg25gBruc= Received: by 10.227.154.196 with SMTP id p4mr6220377wbw.195.1282085365229; Tue, 17 Aug 2010 15:49:25 -0700 (PDT) MIME-Version: 1.0 Sender: gstathis@gmail.com Received: by 10.216.72.207 with HTTP; Tue, 17 Aug 2010 15:49:05 -0700 (PDT) In-Reply-To: References: From: "George P. Stathis" Date: Tue, 17 Aug 2010 18:49:05 -0400 X-Google-Sender-Auth: Dzs7DwOY6bZuknfCwccHRODUig4 Message-ID: Subject: Re: High OS Load Numbers when idle To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Actually, there is nothing in %wa but a ton sitting in %id. This is from the Master: top - 18:30:24 up 5 days, 20:10, =A01 user, =A0load average: 2.55, 1.99, 1.= 25 Tasks: =A089 total, =A0 1 running, =A088 sleeping, =A0 0 stopped, =A0 0 zom= bie Cpu(s): =A00.0%us, =A00.0%sy, =A00.0%ni, 99.8%id, =A00.0%wa, =A00.0%hi, =A0= 0.0%si, =A00.2%st Mem: =A017920228k total, =A02795464k used, 15124764k free, =A0 248428k buff= ers Swap: =A0 =A0 =A0 =A00k total, =A0 =A0 =A0 =A00k used, =A0 =A0 =A0 =A00k fr= ee, =A01398388k cached I have atop installed which is reporting the hadoop/hbase java daemons as the most active processes (barely taking any CPU time though and most of the time in sleep mode): ATOP - domU-12-31-39-18-1 2010/08/17 =A018:31:46 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 10 seconds elapsed PRC | sys =A0 0.01s | user =A0 0.00s | #proc =A0 =A0 89 | #zombie =A0 =A00 = | #exit =A0 =A0 =A00 | CPU | sys =A0 =A0 =A00% | user =A0 =A0 =A00% | irq =A0 =A0 =A0 0% | idle = =A0 =A0200% | wait =A0 =A0 =A00% | cpu | sys =A0 =A0 =A00% | user =A0 =A0 =A00% | irq =A0 =A0 =A0 0% | idle = =A0 =A0100% | cpu000 w =A00% | CPL | avg1 =A0 2.55 | avg5 =A0 =A02.12 | avg15 =A0 1.35 | csw =A0 =A0 2397 = | intr =A0 =A02034 | MEM | tot =A0 17.1G | free =A0 14.4G | cache =A0 1.3G | buff =A0242.6M | sl= ab =A0193.1M | SWP | tot =A0 =A00.0M | free =A0 =A00.0M | =A0 =A0 =A0 =A0 =A0 =A0 =A0| vmc= om =A0 1.6G | vmlim =A0 8.5G | NET | transport =A0 | tcpi =A0 =A0 330 | tcpo =A0 =A0 169 | udpi =A0 =A0 56= 6 | udpo =A0 =A0 147 | NET | network =A0 =A0 | ipi =A0 =A0 =A0896 | ipo =A0 =A0 =A0316 | ipfrw =A0= =A0 =A00 | deliv =A0 =A0896 | NET | eth0 =A0 ---- | pcki =A0 =A0 777 | pcko =A0 =A0 197 | si =A0248 Kbps = | so =A0 70 Kbps | NET | lo =A0 =A0 ---- | pcki =A0 =A0 119 | pcko =A0 =A0 119 | si =A0 =A09 K= bps | so =A0 =A09 Kbps | PID CPU COMMAND-LINE 1/= 1 17613 0% atop 17150 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfM= emor 16527 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.man= agem 16839 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.man= agem 16735 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.man= agem 17083 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfM= emor Same with atop: PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 16527 ubuntu 20 0 2352M 98M 10336 S 0.0 0.6 0:42.05 /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=3D/var/log/h 16735 ubuntu 20 0 2403M 81544 10236 S 0.0 0.5 0:01.56 /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=3D/var/log/h 17083 ubuntu 20 0 4557M 45388 10912 S 0.0 0.3 0:00.65 /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -server -XX:+Heap 1 root 20 0 23684 1880 1272 S 0.0 0.0 0:00.23 /sbin/init 587 root 20 0 247M 4088 2432 S 0.0 0.0 -596523h-14:-8 /usr/sbin/console-kit-daemon --no-daemon 3336 root 20 0 49256 1092 540 S 0.0 0.0 0:00.36 /usr/sbin/ssh= d 16430 nobody 20 0 34408 3704 1060 S 0.0 0.0 0:00.01 gmond 17150 ubuntu 20 0 2519M 112M 11312 S 0.0 0.6 -596523h-14:-8 /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -server -XX So I'm a bit perplexed. Are there any hadoop / hbase specific tricks that can reveal what these processes are doing? -GS On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans w= rote: > > It's not normal, but then again I don't have access to your machines > so I can only speculate. > > Does "top" show you which process is in %wa? If so and it's a java > process, can you figure what's going on in there? > > J-D > > On Tue, Aug 17, 2010 at 11:03 AM, George Stathis wro= te: > > Hello, > > > > We have just setup a new cluster on EC2 using Hadoop 0.20.2 and HBase > > 0.20.3. Our small setup as of right now consists of one master and four > > slaves with a replication factor of 2: > > > > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1 namenode, = 1 > > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper (uses its' = own > > dedicated EMS drive) > > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run 1 datano= de, 1 > > tasktracker, 1 regionserver > > > > We have also installed Ganglia to monitor the cluster stats as we are a= bout > > to run some performance tests but, right out of the box, we are noticin= g > > high system loads (especially on the master node) without any activity > > happening on the clister. Of course, the CPUs are not being utilized at= all, > > but Ganglia is reporting almost all nodes in the red as the 1, 5 an 15 > > minute load times are all above 100% most of the time (i.e. there are m= ore > > than two processes at a time competing for the 2 CPUs time). > > > > Question1: is this normal? > > Question2: does it matter since each process barely uses any of the CPU > > time? > > > > Thank you in advance and pardon the noob questions. > > > > -GS > >