incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: WaitingQueue - MemLimitQueue is full
Date Sun, 11 Nov 2012 07:00:00 GMT
Hi Logan,

It looks like the datanode is saturated when large mapreduce job is in
process.  Chukwa agent will drop data on the floor, if there is more data
that agent can be buffer in memory.  Are the collectors running on
datanode?  Do you have multiple disks for the datanode?  It maybe good to
map number of disks to (task slot - 1) and let chukwa collector write to a
disk that is not used concurrently by mapreduce task to provide good
performance for both data injection and data processing.

regards,
Eric

On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy <logan.hardy@33across.com>wrote:

> We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are
> feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6
> Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster
> gets busy with a big job or the task of decommissioning a node the Chukwa
> agent and collector start to back up and and I start seeing "WaitingQueue -
> MemLimitQueue is full" messages in the agent.log as shown below. As soon as
> hadoop cluster activity dies down the MemLimitQueue messages go away and
> everything goes back to normal.
>
> [root@COLL5 chukwa]# ps auxf | grep chukwa
> root     11258  0.0  0.0  61172   732 pts/0    S+   15:15   0:00
>  \_ grep chukwa
> root     29248  1.2  2.1 415572 86928 ?        Sl   04:03   8:04
> /usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=agent
> -Dlog4j.configuration=chukwa-log4j.properties
> -DCHUKWA_HOME=/usr/local/chukwa/bin/..
> -DCHUKWA_CONF_DIR=/usr/local/chukwa/bin/../conf
> -DCHUKWA_LOG_DIR=/usr/local/chukwa/logs -classpath
> /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar
> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
>
>
> agent.log
> ........
> 2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 7257
> 2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed
> since last report: 547
> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
> length 832
> 2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13
> chunks, got back 13 acks
> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - collected
> 13 chunks
> *2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is
> full [8119214]*
> 2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP post to http://10.5.200.204:8080/ length = 2286662
> 2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258
> 2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
> length 832
> 2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13
> chunks, got back 13 acks
> 2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender - collected
> 13 chunks
> *2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is
> full [8091188]*
> 2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP post to http://10.5.200.204:8080/ length = 2214008
> 2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259
>
>
> Any ideas?
>
> --
> --
> *Logan Hardy *| Operations Engineer
> 33Across <http://www.33across.com/> | Follow us: Twitter<http://www.twitter.com/33across>
>  | Facebook <http://www.facebook.com/33across>
>
> o 801.231.4573
>
> *Learn about our Q1 Brand Graph Category Insights Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
> *
> *
> 33Across and Tynt in the News
> *AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat • WSJ<http://33across.com/news.php#axzz1uqxl0v16>
>
>

Mime
View raw message