incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: CPU hotspot at BloomFilterSerializer#deserialize
Date Mon, 04 Feb 2013 17:29:45 GMT
> Yes, it contains a big row that goes up to 2GB with more than a million of columns.

I've run tests with 10 million small columns and reasonable performance. I've not looked at
1 million large columns.  

>> - BloomFilterSerializer#deserialize does readLong iteratively at each page
>> of size 4K for a given row, which means it could be 500,000 loops(calls
>> readLong) for a 2G row(from 1.0.7 source).
There is only one Bloom filter per row in an SSTable, not one per column index/page. 

It could take a while if there are a lot of sstables in the read. 

nodetool cfhistorgrams will let you know, run it once to reset the counts , then do your test,
then run it again. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/02/2013, at 4:13 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:

> It is interesting the press c* got about having 2 billion columns in a
> row. You *can* do it but it brings to light some realities of what
> that means.
> 
> On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <tsato@cloudian.com> wrote:
>> Hi Aaron,
>> 
>> Thanks for your answers. That helped me get a big picture.
>> 
>> Yes, it contains a big row that goes up to 2GB with more than a million of
>> columns.
>> 
>> Let me confirm if I correctly understand.
>> 
>> - The stack trace is from Slice By Names query. And the deserialization is
>> at the step 3, "Read the row level Bloom Filter", on your blog.
>> 
>> - BloomFilterSerializer#deserialize does readLong iteratively at each page
>> of size 4K for a given row, which means it could be 500,000 loops(calls
>> readLong) for a 2G row(from 1.0.7 source).
>> 
>> Correct?
>> 
>> That makes sense Slice By Names queries against such a wide row could be CPU
>> bottleneck. In fact, in our test environment, a
>> BloomFilterSerializer#deserialize of such a case takes more than 10ms, up to
>> 100ms.
>> 
>>> Get a single named column.
>>> Get the first 10 columns using the natural column order.
>>> Get the last 10 columns using the reversed order.
>> 
>> Interesting. A query pattern could make a difference?
>> 
>> We thought the only solutions is to change the data structure(don't use such
>> a wide row if it is retrieved by Slice By Names query).
>> 
>> Anyway, will give it a try!
>> 
>> Best,
>> Takenori
>> 
>> On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aaron@thelastpickle.com>
>> wrote:
>>> 
>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>> large(2.4G)
>>> 
>>> So very large rows ?
>>> What does nodetool cfstats or cfhistograms say about the row sizes ?
>>> 
>>> 
>>> 1. what is happening?
>>> 
>>> I think this is partially large rows and partially the query pattern, this
>>> is only by roughly correct
>>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here
>>> http://www.datastax.com/events/cassandrasummit2012/presentations
>>> 
>>> 3. any more info required to proceed?
>>> 
>>> Do some tests with different query techniques…
>>> 
>>> Get a single named column.
>>> Get the first 10 columns using the natural column order.
>>> Get the last 10 columns using the reversed order.
>>> 
>>> Hope that helps.
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 31/01/2013, at 7:20 PM, Takenori Sato <tsato@cloudian.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> We have a situation that CPU loads on some of our nodes in a cluster has
>>> spiked occasionally since the last November, which is triggered by requests
>>> for rows that reside on two specific sstables.
>>> 
>>> We confirmed the followings(when spiked):
>>> 
>>> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
>>> jdk: Oracle 1.6.0
>>> 
>>> 1. a profiling showed that BloomFilterSerializer#deserialize was the
>>> hotspot(70% of the total load by running threads)
>>> 
>>> * the stack trace looked like this(simplified)
>>> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
>>> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
>>> ...
>>> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
>>> ...
>>> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
>>> ...
>>> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
>>> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
>>> 66.7% - java.io.DataInputStream.readLong
>>> 
>>> 2. Usually, 1 should be so fast that a profiling by sampling can not
>>> detect
>>> 
>>> 3. no pressure on Cassandra's VM heap nor on machine in overal
>>> 
>>> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat
>>> 1 1000")
>>> 
>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>> large(2.4G)
>>> 
>>> 6. the problematic Filter file size is only 256B(could be normal)
>>> 
>>> 
>>> So now, I am trying to read the Filter file in the same way
>>> BloomFilterSerializer#deserialize does as possible as I can, in order to see
>>> if the file is something wrong.
>>> 
>>> Could you give me some advise on:
>>> 
>>> 1. what is happening?
>>> 2. the best way to simulate the BloomFilterSerializer#deserialize
>>> 3. any more info required to proceed?
>>> 
>>> Thanks,
>>> Takenori
>>> 
>>> 
>> 


Mime
View raw message