incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takenori Sato(Cloudian)" <ts...@cloudian.com>
Subject Re: CPU hotspot at BloomFilterSerializer#deserialize
Date Wed, 06 Feb 2013 01:32:43 GMT
Hi,

We found this issue is specific to 1.0.1 through 1.0.8, which was fixed 
at 1.0.9.

https://issues.apache.org/jira/browse/CASSANDRA-4023

So by upgrading, we will see a reasonable performnace no matter how 
large row we have!

Thanks,
Takenori

(2013/02/05 2:29), aaron morton wrote:
>> Yes, it contains a big row that goes up to 2GB with more than a 
>> million of columns.
> I've run tests with 10 million small columns and reasonable 
> performance. I've not looked at 1 million large columns.
>
>>> - BloomFilterSerializer#deserialize does readLong iteratively at 
>>> each page
>>> of size 4K for a given row, which means it could be 500,000 loops(calls
>>> readLong) for a 2G row(from 1.0.7 source).
> There is only one Bloom filter per row in an SSTable, not one per 
> column index/page.
>
> It could take a while if there are a lot of sstables in the read.
>
> nodetool cfhistorgrams will let you know, run it once to reset the 
> counts , then do your test, then run it again.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/02/2013, at 4:13 AM, Edward Capriolo <edlinuxguru@gmail.com 
> <mailto:edlinuxguru@gmail.com>> wrote:
>
>> It is interesting the press c* got about having 2 billion columns in a
>> row. You *can* do it but it brings to light some realities of what
>> that means.
>>
>> On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <tsato@cloudian.com 
>> <mailto:tsato@cloudian.com>> wrote:
>>> Hi Aaron,
>>>
>>> Thanks for your answers. That helped me get a big picture.
>>>
>>> Yes, it contains a big row that goes up to 2GB with more than a 
>>> million of
>>> columns.
>>>
>>> Let me confirm if I correctly understand.
>>>
>>> - The stack trace is from Slice By Names query. And the 
>>> deserialization is
>>> at the step 3, "Read the row level Bloom Filter", on your blog.
>>>
>>> - BloomFilterSerializer#deserialize does readLong iteratively at 
>>> each page
>>> of size 4K for a given row, which means it could be 500,000 loops(calls
>>> readLong) for a 2G row(from 1.0.7 source).
>>>
>>> Correct?
>>>
>>> That makes sense Slice By Names queries against such a wide row 
>>> could be CPU
>>> bottleneck. In fact, in our test environment, a
>>> BloomFilterSerializer#deserialize of such a case takes more than 
>>> 10ms, up to
>>> 100ms.
>>>
>>>> Get a single named column.
>>>> Get the first 10 columns using the natural column order.
>>>> Get the last 10 columns using the reversed order.
>>>
>>> Interesting. A query pattern could make a difference?
>>>
>>> We thought the only solutions is to change the data structure(don't 
>>> use such
>>> a wide row if it is retrieved by Slice By Names query).
>>>
>>> Anyway, will give it a try!
>>>
>>> Best,
>>> Takenori
>>>
>>> On Sat, Feb 2, 2013 at 2:55 AM, aaron morton 
>>> <aaron@thelastpickle.com <mailto:aaron@thelastpickle.com>>
>>> wrote:
>>>>
>>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>>> large(2.4G)
>>>>
>>>> So very large rows ?
>>>> What does nodetool cfstats or cfhistograms say about the row sizes ?
>>>>
>>>>
>>>> 1. what is happening?
>>>>
>>>> I think this is partially large rows and partially the query 
>>>> pattern, this
>>>> is only by roughly correct
>>>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my 
>>>> talk here
>>>> http://www.datastax.com/events/cassandrasummit2012/presentations
>>>>
>>>> 3. any more info required to proceed?
>>>>
>>>> Do some tests with different query techniques…
>>>>
>>>> Get a single named column.
>>>> Get the first 10 columns using the natural column order.
>>>> Get the last 10 columns using the reversed order.
>>>>
>>>> Hope that helps.
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 31/01/2013, at 7:20 PM, Takenori Sato <tsato@cloudian.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> We have a situation that CPU loads on some of our nodes in a 
>>>> cluster has
>>>> spiked occasionally since the last November, which is triggered by 
>>>> requests
>>>> for rows that reside on two specific sstables.
>>>>
>>>> We confirmed the followings(when spiked):
>>>>
>>>> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
>>>> jdk: Oracle 1.6.0
>>>>
>>>> 1. a profiling showed that BloomFilterSerializer#deserialize was the
>>>> hotspot(70% of the total load by running threads)
>>>>
>>>> * the stack trace looked like this(simplified)
>>>> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
>>>> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
>>>> ...
>>>> 90.4% - 
>>>> org.apache.cassandra.db.CollationController.collectTimeOrderedData
>>>> ...
>>>> 89.5% - 
>>>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
>>>> ...
>>>> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
>>>> 68.9% - 
>>>> org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
>>>> 66.7% - java.io.DataInputStream.readLong
>>>>
>>>> 2. Usually, 1 should be so fast that a profiling by sampling can not
>>>> detect
>>>>
>>>> 3. no pressure on Cassandra's VM heap nor on machine in overal
>>>>
>>>> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by 
>>>> "iostat
>>>> 1 1000")
>>>>
>>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>>> large(2.4G)
>>>>
>>>> 6. the problematic Filter file size is only 256B(could be normal)
>>>>
>>>>
>>>> So now, I am trying to read the Filter file in the same way
>>>> BloomFilterSerializer#deserialize does as possible as I can, in 
>>>> order to see
>>>> if the file is something wrong.
>>>>
>>>> Could you give me some advise on:
>>>>
>>>> 1. what is happening?
>>>> 2. the best way to simulate the BloomFilterSerializer#deserialize
>>>> 3. any more info required to proceed?
>>>>
>>>> Thanks,
>>>> Takenori
>>>>
>>>>
>>>
>


Mime
View raw message