incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takenori Sato <ts...@cloudian.com>
Subject Re: CPU hotspot at BloomFilterSerializer#deserialize
Date Sun, 03 Feb 2013 13:09:04 GMT
Hi Aaron,

Thanks for your answers. That helped me get a big picture.

Yes, it contains a big row that goes up to 2GB with more than a million of
columns.

Let me confirm if I correctly understand.

- The stack trace is from Slice By Names query. And the deserialization is
at the step 3, "Read the row level Bloom Filter", on your blog.

- BloomFilterSerializer#deserialize does readLong iteratively at each page
of size 4K for a given row, which means it could be 500,000 loops(calls
readLong) for a 2G row(from 1.0.7 source).

Correct?

That makes sense Slice By Names queries against such a wide row could be
CPU bottleneck. In fact, in our test environment, a
BloomFilterSerializer#deserialize of such a case takes more than 10ms, up
to 100ms.

> Get a single named column.
> Get the first 10 columns using the natural column order.
> Get the last 10 columns using the reversed order.

Interesting. A query pattern could make a difference?

We thought the only solutions is to change the data structure(don't use
such a wide row if it is retrieved by Slice By Names query).

Anyway, will give it a try!

Best,
Takenori

On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aaron@thelastpickle.com>wrote:

> 5. the problematic Data file contains only 5 to 10 keys data but
> large(2.4G)
>
> So very large rows ?
> What does nodetool cfstats or cfhistograms say about the row sizes ?
>
>
> 1. what is happening?
>
> I think this is partially large rows and partially the query pattern, this
> is only by roughly correct
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk
> here http://www.datastax.com/events/cassandrasummit2012/presentations
>
> 3. any more info required to proceed?
>
> Do some tests with different query techniques…
>
> Get a single named column.
> Get the first 10 columns using the natural column order.
> Get the last 10 columns using the reversed order.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/01/2013, at 7:20 PM, Takenori Sato <tsato@cloudian.com> wrote:
>
> Hi all,
>
> We have a situation that CPU loads on some of our nodes in a cluster has
> spiked occasionally since the last November, which is triggered by requests
> for rows that reside on two specific sstables.
>
> We confirmed the followings(when spiked):
>
> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
> jdk: Oracle 1.6.0
>
> 1. a profiling showed that BloomFilterSerializer#deserialize was the
> hotspot(70% of the total load by running threads)
>
> * the stack trace looked like this(simplified)
> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
> ...
> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
> ...
> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
> ...
> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
> 66.7% - java.io.DataInputStream.readLong
>
> 2. Usually, 1 should be so fast that a profiling by sampling can not detect
>
> 3. no pressure on Cassandra's VM heap nor on machine in overal
>
> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat
> 1 1000")
>
> 5. the problematic Data file contains only 5 to 10 keys data but
> large(2.4G)
>
> 6. the problematic Filter file size is only 256B(could be normal)
>
>
> So now, I am trying to read the Filter file in the same way
> BloomFilterSerializer#deserialize does as possible as I can, in order to
> see if the file is something wrong.
>
> Could you give me some advise on:
>
> 1. what is happening?
> 2. the best way to simulate the BloomFilterSerializer#deserialize
> 3. any more info required to proceed?
>
> Thanks,
> Takenori
>
>
>

Mime
View raw message