Thanks a lot. It seems that a fix is commited now and fix will appear in the next release, so I won't need my own patched cassandra :)

Hi Vitalii,

I sent patch.

Glad you've got it working properly. I've tried to make as "local" changes as possible, so changed only single value calculation. But it's possible your way is better and will be accepted by cassandra maintainer. Could you attach your patch to the ticket. I'd like for any fix to be applied to the trunk since currently I have to make my own patched build each time I upgrade because of the bug.

I agree with your observations.
From another hand I found that ColumnFamily.size() doesn't calculate object size correctly. It doesn't count two fields Objects sizes and returns 0 if there is no object in columns container.
I increased initial size variable value to 24 which is size of two objects(I didn't now what's correct value), and cassandra started calculating live ratio correctly, increasing trhouhput value and flushing memtables.

On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn <> wrote:

For me " there are no dirty column families" in your message tells it's possibly the same problem.
The issue is that column families that gets full row deletes only do not get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any ratio can't help simply because it is multiplied by 0. Check your cfstats.

Thank you Vitalii.

Looking at the Jonathan's answer to your patch I think it's probably not my case. I see that LiveRatio is calculated in my case, but calculations look strange:

WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 (line 181) setting live ratio to maximum of 64 instead of Infinity
INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 (line 186) CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0 (just-counted was 64.0). calculation took 63355ms for 0 columns

Looking at the comments in the code: "If it gets higher than 64 something is probably broken.", looks like it's probably the problem.
Not sure how to investigate it.

I did post a fix there that helped me.

2012/4/24 crypto five <>

I have 50 millions of rows in column family on 4G RAM box. I allocatedf 2GB to cassandra.
I have program which is traversing this CF and cleaning some data there, it generates about 20k delete statements per second.
After about of 3 millions deletions cassandra stops responding to queries: it doesn't react to CLI, nodetool etc.
I see in the logs that it tries to free some memory but can't even if I wait whole day.
Also I see following in the logs:

INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 (line 2647) Unable to reduce heap usage since there are no dirty column families

When I am looking at memory dump I see that memory goes to ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%), int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%), ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

What can I do to make cassandra stop dying?
Why it can't free the memory?
Any ideas?

Thank you.

