incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Logan Hardy <logan.ha...@33across.com>
Subject Re: WaitingQueue - MemLimitQueue is full
Date Sun, 11 Nov 2012 17:49:28 GMT
Eric,
  Thanks for your ideas on this. I've actually traced this issue to a
single saturated link in our datacenter. But you've given me some ideas on
how I can optimize this system some more. Thanks.

Logan

On Sun, Nov 11, 2012 at 12:00 AM, Eric Yang <eric818@gmail.com> wrote:

> Hi Logan,
>
> It looks like the datanode is saturated when large mapreduce job is in
> process.  Chukwa agent will drop data on the floor, if there is more data
> that agent can be buffer in memory.  Are the collectors running on
> datanode?  Do you have multiple disks for the datanode?  It maybe good to
> map number of disks to (task slot - 1) and let chukwa collector write to a
> disk that is not used concurrently by mapreduce task to provide good
> performance for both data injection and data processing.
>
> regards,
> Eric
>
> On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy <logan.hardy@33across.com>wrote:
>
>> We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are
>> feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6
>> Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster
>> gets busy with a big job or the task of decommissioning a node the Chukwa
>> agent and collector start to back up and and I start seeing "WaitingQueue -
>> MemLimitQueue is full" messages in the agent.log as shown below. As soon as
>> hadoop cluster activity dies down the MemLimitQueue messages go away and
>> everything goes back to normal.
>>
>> [root@COLL5 chukwa]# ps auxf | grep chukwa
>> root     11258  0.0  0.0  61172   732 pts/0    S+   15:15   0:00
>>  \_ grep chukwa
>> root     29248  1.2  2.1 415572 86928 ?        Sl   04:03   8:04
>> /usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=agent
>> -Dlog4j.configuration=chukwa-log4j.properties
>> -DCHUKWA_HOME=/usr/local/chukwa/bin/..
>> -DCHUKWA_CONF_DIR=/usr/local/chukwa/bin/../conf
>> -DCHUKWA_LOG_DIR=/usr/local/chukwa/logs -classpath
>> /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar
>> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
>>
>>
>> agent.log
>> ........
>> 2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 7257
>> 2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed
>> since last report: 547
>> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
>> length 832
>> 2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13
>> chunks, got back 13 acks
>> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender -
>> collected 13 chunks
>> *2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is
>> full [8119214]*
>> 2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP post to http://10.5.200.204:8080/ length = 2286662
>> 2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258
>> 2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
>> length 832
>> 2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13
>> chunks, got back 13 acks
>> 2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender -
>> collected 13 chunks
>> *2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is
>> full [8091188]*
>> 2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP post to http://10.5.200.204:8080/ length = 2214008
>> 2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259
>>
>>
>> Any ideas?
>>
>> --
>> --
>> *Logan Hardy *| Operations Engineer
>> 33Across <http://www.33across.com/> | Follow us: Twitter<http://www.twitter.com/33across>
>>  | Facebook <http://www.facebook.com/33across>
>>
>> o 801.231.4573
>>
>> *Learn about our Q1 Brand Graph Category Insights Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
>> *
>> *
>> 33Across and Tynt in the News
>> *AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat •
>> WSJ <http://33across.com/news.php#axzz1uqxl0v16>
>>
>>
>


-- 
-- 
*Logan Hardy *| Operations Engineer
33Across <http://www.33across.com/> | Follow us:
Twitter<http://www.twitter.com/33across>
 | Facebook <http://www.facebook.com/33across>

o 801.231.4573

*Learn about our Q1 Brand Graph Category Insights
Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
*
*
33Across and Tynt in the News
*AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat •
WSJ<http://33across.com/news.php#axzz1uqxl0v16>

Mime
View raw message