incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: CPU hotspot at BloomFilterSerializer#deserialize
Date Fri, 01 Feb 2013 17:55:07 GMT
> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G)
So very large rows ? 
What does nodetool cfstats or cfhistograms say about the row sizes ? 


> 1. what is happening?

I think this is partially large rows and partially the query pattern, this is only by roughly
correct http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here http://www.datastax.com/events/cassandrasummit2012/presentations

> 3. any more info required to proceed?

Do some tests with different query techniques…

Get a single named column. 
Get the first 10 columns using the natural column order.
Get the last 10 columns using the reversed order. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 7:20 PM, Takenori Sato <tsato@cloudian.com> wrote:

> Hi all,
> 
> We have a situation that CPU loads on some of our nodes in a cluster has spiked occasionally
since the last November, which is triggered by requests for rows that reside on two specific
sstables.
> 
> We confirmed the followings(when spiked):
> 
> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
> jdk: Oracle 1.6.0
> 
> 1. a profiling showed that BloomFilterSerializer#deserialize was the hotspot(70% of the
total load by running threads)
> 
> * the stack trace looked like this(simplified)
> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
> ...
> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
> ...
> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
> ...
> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
> 66.7% - java.io.DataInputStream.readLong
> 
> 2. Usually, 1 should be so fast that a profiling by sampling can not detect
> 
> 3. no pressure on Cassandra's VM heap nor on machine in overal
> 
> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat 1 1000")
> 
> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G)
> 
> 6. the problematic Filter file size is only 256B(could be normal)
> 
> 
> So now, I am trying to read the Filter file in the same way BloomFilterSerializer#deserialize
does as possible as I can, in order to see if the file is something wrong.
> 
> Could you give me some advise on:
> 
> 1. what is happening?
> 2. the best way to simulate the BloomFilterSerializer#deserialize
> 3. any more info required to proceed?
> 
> Thanks,
> Takenori


Mime
View raw message