cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <>
Subject Re: Cassandra timeout whereas it is not much busy
Date Mon, 21 Jan 2013 14:28:11 GMT
Le 17 janv. 2013 à 05:00, aaron morton <> a écrit :

> Check the disk utilisation using iostat -x 5
> If you are on a VM / in the cloud check for CPU steal. 
> Check the logs for messages from the GCInspector, the ParNew events are times the JVM
is paused. 

I have seen logs about that. I didn't worry much, since the GC of the jvm was not under pressure.
As far as I understand, unless a CF is "continuously" flushed, it should not be a major issue,
isn't it ?
I don't know for sure if there was a lot of flush though, since my nodes were not properly

> Look at the times dropped messages are logged and try to correlate them with other server

I tried that with not much success. I have graphs on cacti though, so this is quite hard to
visualize when things happen simultaneously on several graphs.

> If you have a lot secondary indexes, or a lot of memtables flushing at the some time
you may be blocking behind the global Switch Lock. If you use secondary indexes make sure
the memtable_flush_queue_size is set correctly, see the comments in the yaml file.

I have no secondary indexes.

> If you have a lot of CF's flushing at the same time, and there are not messages from
the "MeteredFlusher", it may be the log segment is too big for the number of CF's you have.
When the segment needs to be recycled all dirty CF's are flushed, if you have a lot of cf's
this can result in blocking around the switch lock. Trying reducing the commitlog_segment_size_in_mb
so that less CF's are flushed.

What is "a lot" ? We have 26 CF. 9 are barely used. 15 contains time series data (cassandra
rocks with them) in which only 3 of them have from 1 to 10 read or writes per sec. 1 quite
hot (200read/s) which is mainly used for its bloom filter (which "disksize" is about 1G).
And 1 also hot used only for writes (which has the same big bloom filter, which I am about
to remove since it is useless).

BTW, thanks for the pointers. I have not tried yet to put our nodes under pressure. But when
I'll do, I'll look at those pointers closely.


> Hope that helps
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> @aaronmorton
> On 17/01/2013, at 10:30 AM, Nicolas Lalevée <> wrote:
>> Hi,
>> I have a strange behavior I am not able to understand.
>> I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a replication
factor of 3.
>> ---------------
>> my story is maybe too long, trying shorter here, while saving what I wrote in case
someone has patience to read my bad english ;)
>> I got under a situation where my cluster was generating a lot of timeouts on our
frontend, whereas I could not see any major trouble on the internal stats. Actually cpu, read
& write counts on the column families were quite low. A mess until I switched from java7
to java6 and forced the used of jamm. After the switch, cpu, read & write counts, were
going up again, timeouts gone. I have seen this behavior while reducing the xmx too.
>> What could be blocking cassandra from utilizing the while resources of the machine
? Is there is metrics I didn't saw which could explain this ?
>> ---------------
>> Here is the long story.
>> When I first set my cluster up, I gave blindly 6G of heap to the cassandra nodes,
thinking that more a java process has, the smoother it runs, while keeping some RAM to the
disk cache. We got some new feature deployed, and things were going into hell, some machine
up to 60% of wa. I give credit to cassandra because there was not that much timeout received
on the web frontend, it was kind of slow but is was kind of working. With some optimizations,
we reduced the pressure of the new feature, but it was still at 40%wa.
>> At that time I didn't have much monitoring, just heap and cpu. I read some article
how to tune, and I learned that the disk cache is quite important because cassandra relies
on it to be the read cache. So I have tried many xmx, and 3G seems of kind the lowest possible.
So on 2 among 6 nodes, I have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite
happy with that, I changed the xmx 3,3G on each node. But then things really went to hell,
a lot of timeouts on the frontend. It was not working at all. So I rolled back.
>> After some time, probably because of the growing data of the new feature to a nominal
size, things went again to very high %wa, and cassandra was not able to keep it up. So we
kind of reverted the feature, the column family is still used but only by one thread on the
frontend. The wa was reduced to 20%, but things continued to not properly working, from time
to time, a bunch of timeout are raised on our frontend.
>> In the mean time, I took time to do some proper monitoring of cassandra: column family
read & write counts, latency, memtable size, but also the dropped messages, the pending
tasks, the timeouts between nodes. It's just a start but it haves me a first nice view of
what is actually going on.
>> I tried again reducing the xmx on one node. Cassandra is not complaining of having
not enough heap, memtables are not flushed insanely every second, the number of read and write
is reduced compared to the other node, the cpu is lower too, there is not much pending tasks,
no message dropped more than 1 or 2 from time to time. Everything indicates that there is
probably more room to more work, but the node doesn't take it. Even its read and write latencies
are lower than on the other nodes. But if I keep this long enough with this xmx, timeouts
start to raise on the frontends.
>> After some individual node experiment, the cluster was starting be be quite "sick".
Even with 6G, the %wa were reducing, read and write counts too, on kind of every node. And
more and more timeout raised on the frontend.
>> The only thing that I could see worrying, is the heap climbing slowly above the 75%
threshold and from time to time suddenly dropping from 95% to 70%. I looked at the full gc
counter, not much pressure.
>> And another thing was some "Timed out replaying hints to /; aborting further
deliveries" in the log. But logged as info, so I guess not much important.
>> After some long useless staring at the monitoring graphs, I gave a try to using the
openjdk 6b24 rather than openjdk 7u9, and force cassandra to load jamm, since in 1.0 the init
script blacklist the openjdk. Node after node, I saw that the heap was behaving more like
I use to see on jam based apps, some nice up and down rather than a long and slow climb. But
read and write counts were still low on every node, and timeout were still bursting on our
>> A continuing mess until I restarted the "first" node of the cluster. There was still
one to switch to java6 + jamm, but as soon as I restarted my "first" node, every node started
working more, %wa climbing, read & write count climbing, no more timeout on the frontend,
the frontend being then fast has hell.
>> I understand that my cluster is probably under-capacity. But I don't understand how
since there is something within cassandra which might block the full use of the machine resources.
It seems kind of related to the heap, but I don't know how. Any idea ?
>> I intend to start monitoring more metrics, but do you have any hint on which could
explain that behavior ?
>> Nicolas

View raw message