Yes, it contains a big row that goes up
to 2GB with more than a million of columns.
I've run tests with 10 million small columns and reasonable
performance. I've not looked at 1 million large columns.
- BloomFilterSerializer#deserialize does
readLong iteratively at each page
of size 4K for a given row, which means it could be 500,000
loops(calls
readLong) for a 2G row(from 1.0.7 source).
There is only one Bloom filter per row in an SSTable, not one per
column index/page.
It could take a while if there are a lot of sstables in the
read.
nodetool cfhistorgrams will let you know, run it once to
reset the counts , then do your test, then run it again.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
It is interesting the press c* got
about having 2 billion columns in a
row. You *can* do it but it brings to light some realities
of what
that means.
On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <tsato@cloudian.com>
wrote:
Hi Aaron,
Thanks for your answers. That helped me get a big
picture.
Yes, it contains a big row that goes up to 2GB with more
than a million of
columns.
Let me confirm if I correctly understand.
- The stack trace is from Slice By Names query. And the
deserialization is
at the step 3, "Read the row level Bloom Filter", on
your blog.
- BloomFilterSerializer#deserialize does readLong
iteratively at each page
of size 4K for a given row, which means it could be
500,000 loops(calls
readLong) for a 2G row(from 1.0.7 source).
Correct?
That makes sense Slice By Names queries against such a
wide row could be CPU
bottleneck. In fact, in our test environment, a
BloomFilterSerializer#deserialize of such a case takes
more than 10ms, up to
100ms.
Get a single named column.
Get the first 10 columns using the natural column
order.
Get the last 10 columns using the reversed order.
Interesting. A query pattern could make a difference?
We thought the only solutions is to change the data
structure(don't use such
a wide row if it is retrieved by Slice By Names query).
Anyway, will give it a try!
Best,
Takenori
On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aaron@thelastpickle.com>
wrote:
5. the problematic Data file contains only 5 to 10
keys data but
large(2.4G)
So very large rows ?
What does nodetool cfstats or cfhistograms say about
the row sizes ?
1. what is happening?
I think this is partially large rows and partially the
query pattern, this
is only by roughly correct
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
and my talk here
http://www.datastax.com/events/cassandrasummit2012/presentations
3. any more info required to proceed?
Do some tests with different query techniques…
Get a single named column.
Get the first 10 columns using the natural column
order.
Get the last 10 columns using the reversed order.
Hope that helps.
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 31/01/2013, at 7:20 PM, Takenori Sato
<tsato@cloudian.com> wrote:
Hi all,
We have a situation that CPU loads on some of our
nodes in a cluster has
spiked occasionally since the last November, which is
triggered by requests
for rows that reside on two specific sstables.
We confirmed the followings(when spiked):
version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <-
0.7.8
jdk: Oracle 1.6.0
1. a profiling showed that
BloomFilterSerializer#deserialize was the
hotspot(70% of the total load by running threads)
* the stack trace looked like this(simplified)
90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
90.4% -
org.apache.cassandra.db.SliceByNamesReadCommand.getRow
...
90.4% -
org.apache.cassandra.db.CollationController.collectTimeOrderedData
...
89.5% -
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
...
79.9% -
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
68.9% -
org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
66.7% - java.io.DataInputStream.readLong
2. Usually, 1 should be so fast that a profiling by
sampling can not
detect
3. no pressure on Cassandra's VM heap nor on machine
in overal
4. a little I/O traffic for our 8 disks/node(up to
100tps/disk by "iostat
1 1000")
5. the problematic Data file contains only 5 to 10
keys data but
large(2.4G)
6. the problematic Filter file size is only 256B(could
be normal)
So now, I am trying to read the Filter file in the
same way
BloomFilterSerializer#deserialize does as possible as
I can, in order to see
if the file is something wrong.
Could you give me some advise on:
1. what is happening?
2. the best way to simulate the
BloomFilterSerializer#deserialize
3. any more info required to proceed?
Thanks,
Takenori