cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: frequent client exceptions on 0.7.0
Date Thu, 17 Feb 2011 20:22:52 GMT
Messages been dropped means the machine node is overloaded. Look at the thread pool stats to
see which thread pools have queues. It may be IO related, so also check the read and write
latency on the CF and use iostat.

i would try those first, then jump into GC land.

Aaron

On 18/02/2011, at 4:43 AM, Dan Hendry <dan.hendry.junk@gmail.com> wrote:

> Try turning on GC logging in Cassandra-env.sh, specifically:
> 
>    -XX:+PrintGCApplicationStoppedTime
>    -Xloggc:/var/log/cassandra/gc.log
> 
> Look for things like: "Total time for which application threads were
> stopped: 52.8795600 seconds". Anything over about a few seconds may be
> causing your problem.
> 
> Stop the world GC is a real pain. In my cluster I was, and still am to some
> extent, seeing each node go 'down' about 10-30 times a day and up to a few
> hundred when running major compactions (by greping through the Cassandra
> system log). GC tuning is an art into itself but if this is your problem,
> try:
>    - lower memtable flush thresholds
>    - reduce new gen size (which is explicitly set in 0.7.1+, the -Xmn
> setting)
>    - reducing CMSInitiatingOccupancyFraction from 75 to 60 or so (maybe
> less)
>    - set -XX:ParallelGCThreads=<NUMBER OF CPU CORES>
>    - set -XX:ParallelCMSThreads=<NUMBER OF CPU CORES>
> 
> Again, I would recommend you do some more research into GC tuning
> (http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html is a
> good place to start). Most of my recommendations above will probably reduce
> the chance of your nodes going 'down' but may have pretty severe negative
> performance impacts. In my cluster, I found the measures needed to ensure
> the node never (or rarely, it cant be completely prevented) went down just
> were not worth it. I have ended up running the nodes closer to the wire and
> living with an increased rate of client side exceptions and nodes going down
> for short periods.
> 
> Dan
> 
> -----Original Message-----
> From: Andy Skalet [mailto:aeskalet@bitjug.com] 
> Sent: February-17-11 4:18
> To: Peter Schuller
> Cc: user@cassandra.apache.org
> Subject: Re: frequent client exceptions on 0.7.0
> 
> On Thu, Feb 17, 2011 at 12:37 AM, Peter Schuller
> <peter.schuller@infidyne.com> wrote:
>> Bottom line: Check /var/log/cassandra/system.log to begin with and see
>> if it's reporting anything or being restarted.
> 
> Thanks, Peter.
> 
> In the system.log, I see quite a few of these across several machines.
> Everything else in the log is INFO level.
> 
> WARN [ScheduledTasks:1] 2011-02-17 07:19:47,491 MessagingService.java
> (line 545) Dropped 182 READ messages in the last 5000ms
> WARN [ScheduledTasks:1] 2011-02-17 08:10:06,142 MessagingService.java
> (line 545) Dropped 31 READ messages in the last 5000ms
> WARN [ScheduledTasks:1] 2011-02-17 08:11:12,237 MessagingService.java
> (line 545) Dropped 54 READ messages in the last 5000ms
> WARN [ScheduledTasks:1] 2011-02-17 08:11:17,392 MessagingService.java
> (line 545) Dropped 487 READ messages in the last 5000ms
> 
> The machines are in EC2 with firewall permission to talk to each
> other, so while not the most solid of network environments, at least
> pretty common these days.  System is not going down, and cassandra
> process is not dying.
> 
> Andy
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 9.0.872 / Virus Database: 271.1.1/3447 - Release Date: 02/16/11
> 02:34:00
> 

Mime
View raw message