Return-Path: X-Original-To: apmail-incubator-chukwa-user-archive@www.apache.org Delivered-To: apmail-incubator-chukwa-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0656CD9F2 for ; Sun, 11 Nov 2012 07:00:41 +0000 (UTC) Received: (qmail 32151 invoked by uid 500); 11 Nov 2012 07:00:40 -0000 Delivered-To: apmail-incubator-chukwa-user-archive@incubator.apache.org Received: (qmail 31721 invoked by uid 500); 11 Nov 2012 07:00:30 -0000 Mailing-List: contact chukwa-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: chukwa-user@incubator.apache.org Delivered-To: mailing list chukwa-user@incubator.apache.org Received: (qmail 31669 invoked by uid 99); 11 Nov 2012 07:00:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Nov 2012 07:00:28 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eric818@gmail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Nov 2012 07:00:21 +0000 Received: by mail-vb0-f47.google.com with SMTP id ez10so5259870vbb.6 for ; Sat, 10 Nov 2012 23:00:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=sQ8qT84gGS/YTaccCgZeX9Jxg6NJQGFOsfnAQrmIaxA=; b=Re2FnjGm7FZdeju3I7NfoW0qwSLtOAxbP3Of0ZxIG8AGHhZMLL3L/AGcpP5mOYJiF8 JuNbDGSzGpewtBId7fKw9Q0n+USRGDc2dZjmoycFco08tb7bXyNJkLcxdg0c+Bi5g9Yd XzxU7aKfJ6Gc0G99d3fepNJAeW1UtBiVRepRs3u6/qUiGv3nKiICx4jKkztgz5GFE+1X jtgQ5KpjgM4Jxtf7nZYA0ruzeHl7L23G2oAq7dJXB9KJBx0T+0mfmvs3GTBtqKutVEDZ fTBerjGU/FI3Sw/6S5ovClKoVDwFyZ22z3htyTMMYG9ID3wcakwfadZjHryE3E0NNw1o WA0A== MIME-Version: 1.0 Received: by 10.58.39.42 with SMTP id m10mr13978208vek.21.1352617200514; Sat, 10 Nov 2012 23:00:00 -0800 (PST) Received: by 10.58.18.166 with HTTP; Sat, 10 Nov 2012 23:00:00 -0800 (PST) In-Reply-To: References: Date: Sat, 10 Nov 2012 23:00:00 -0800 Message-ID: Subject: Re: WaitingQueue - MemLimitQueue is full From: Eric Yang To: chukwa-user@incubator.apache.org Content-Type: multipart/alternative; boundary=089e0115e9d4c0f61204ce32bbab X-Virus-Checked: Checked by ClamAV on apache.org --089e0115e9d4c0f61204ce32bbab Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Logan, It looks like the datanode is saturated when large mapreduce job is in process. Chukwa agent will drop data on the floor, if there is more data that agent can be buffer in memory. Are the collectors running on datanode? Do you have multiple disks for the datanode? It maybe good to map number of disks to (task slot - 1) and let chukwa collector write to a disk that is not used concurrently by mapreduce task to provide good performance for both data injection and data processing. regards, Eric On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy wrot= e: > We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are > feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6 > Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster > gets busy with a big job or the task of decommissioning a node the Chukwa > agent and collector start to back up and and I start seeing "WaitingQueue= - > MemLimitQueue is full" messages in the agent.log as shown below. As soon = as > hadoop cluster activity dies down the MemLimitQueue messages go away and > everything goes back to normal. > > [root@COLL5 chukwa]# ps auxf | grep chukwa > root 11258 0.0 0.0 61172 732 pts/0 S+ 15:15 0:00 > \_ grep chukwa > root 29248 1.2 2.1 415572 86928 ? Sl 04:03 8:04 > /usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=3Dagent > -Dlog4j.configuration=3Dchukwa-log4j.properties > -DCHUKWA_HOME=3D/usr/local/chukwa/bin/.. > -DCHUKWA_CONF_DIR=3D/usr/local/chukwa/bin/../conf > -DCHUKWA_LOG_DIR=3D/usr/local/chukwa/logs -classpath > /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.= 0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/= ../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosApp= ender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukw= a/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.ja= r:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukw= a/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/comm= ons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:= /usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/b= in/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons= -io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/ch= ukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/comm= ons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.ja= r:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/= ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/loca= l/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib= /jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/l= ocal/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../li= b/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukw= a/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api= -1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin= /../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-= 5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/..= /lib/servlet-api-2.5-6.1.11.jar > org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent > > > agent.log > ........ > 2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 725= 7 > 2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed > since last report: 547 > 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>> > HTTP Got success back from http://10.5.200.204:8080/chukwa; response > length 832 > 2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13 > chunks, got back 13 acks > 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - collecte= d > 13 chunks > *2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is > full [8119214]* > 2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>> > HTTP post to http://10.5.200.204:8080/ length =3D 2286662 > 2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 725= 8 > 2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>> > HTTP Got success back from http://10.5.200.204:8080/chukwa; response > length 832 > 2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13 > chunks, got back 13 acks > 2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender - collecte= d > 13 chunks > *2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is > full [8091188]* > 2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>> > HTTP post to http://10.5.200.204:8080/ length =3D 2214008 > 2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 725= 9 > > > Any ideas? > > -- > -- > *Logan Hardy *| Operations Engineer > 33Across | Follow us: Twitter > | Facebook > > o 801.231.4573 > > *Learn about our Q1 Brand Graph Category Insights Report > * > * > 33Across and Tynt in the News > *AdWeek =95 AllThingsD =95 Bloomberg =95 Forbes =95 TechCrunch =95 Ventur= eBeat =95 WSJ > > --089e0115e9d4c0f61204ce32bbab Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Logan,

It looks like the datanode is saturated when l= arge mapreduce job is in process. =A0Chukwa agent will drop data on the flo= or, if there is more data that agent can be buffer in memory. =A0Are the co= llectors running on datanode? =A0Do you have multiple disks for the datanod= e? =A0It maybe good to map number of disks to (task slot - 1) and let chukw= a collector write to a disk that is not used concurrently by mapreduce task= to provide good performance for both data injection and data processing.

regards,
Eric

On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy &l= t;logan.hardy= @33across.com> wrote:
We are running CentOS 5.4, Chukwa 0.3.0= , java version "1.6.0_17", and are feeding a steady stream of dat= a into our CDH3u3 Hadoop cluster. We have 6 Chukwa agent machines feeding 3= Chukwa collectors. Any time the cluster gets busy with a big job or the ta= sk of decommissioning a node the Chukwa agent and collector start to back u= p and and I start seeing "WaitingQueue - MemLimitQueue is full" m= essages in the agent.log as shown below. As soon as hadoop cluster activity= dies down the MemLimitQueue messages go away and everything goes back to n= ormal.

[root@COLL5 chukwa]# ps auxf | grep chukwa
root =A0 =A0 11258 =A00.0 =A00.0 =A061172 =A0 732 pts/0 =A0 =A0S+ =A0 15= :15 =A0 0:00 =A0 =A0 =A0 =A0 =A0\_ grep chukwa
root =A0 =A0 29248= =A01.2 =A02.1 415572 86928 ? =A0 =A0 =A0 =A0Sl =A0 04:03 =A0 8:04 /usr/jav= a/default/bin/java -Xms32M -Xmx64M -DAPP=3Dagent -Dlog4j.configuration=3Dch= ukwa-log4j.properties -DCHUKWA_HOME=3D/usr/local/chukwa/bin/.. -DCHUKWA_CON= F_DIR=3D/usr/local/chukwa/bin/../conf -DCHUKWA_LOG_DIR=3D/usr/local/chukwa/= logs -classpath /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chu= kwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/loc= al/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/..= /lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/u= sr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../= lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/u= sr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/b= in/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collec= tions-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/= local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/= ../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar= :/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/b= in/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/common= s-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chuk= wa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3= .jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chu= kwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1= .11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/ch= ukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/u= sr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../= lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/lo= cal/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-c= onnector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local= /chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar org.apache.hadoop.chukwa.data= collection.agent.ChukwaAgent


agent.log
........
=
2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkp= oint 7257
2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # = http chunks ACK'ed since last report: 547
2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >&= gt;>>>> HTTP Got success back from http://10.5.200.204:8080/chukwa; resp= onse length 832
2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13 = chunks, got back 13 acks
2012-11-10 14:56:20,163 INFO HTTP post t= hread ChukwaHttpSender - collected 13 chunks
2012-11-10 14:56:= 20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is full [8119214]
2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >&= gt;>>>> HTTP post to http://10.5.200.204:8080/ length =3D 2286662
20= 12-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258
2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >&= gt;>>>> HTTP Got success back from http://10.5.200.204:8080/chukwa; resp= onse length 832
2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13 = chunks, got back 13 acks
2012-11-10 14:56:27,294 INFO HTTP post t= hread ChukwaHttpSender - collected 13 chunks
2012-11-10 14:56:= 27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is full [8091188]
2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >&= gt;>>>> HTTP post to http://10.5.200.204:8080/ length =3D 2214008
20= 12-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259


Any ideas?

--
--=A0
Loga= n Hardy=A0| Operations Engineer
33Across=A0|=A0Follow us:=A0= Twitter=A0|=A0Facebook

o=A0801.231.4573

Learn about our=A0Q1 Brand Graph Category Insights Repor= t


--089e0115e9d4c0f61204ce32bbab--