incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
Date Tue, 09 Mar 2010 14:31:53 GMT
On Tue, Mar 9, 2010 at 2:52 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> By "reads" do you mean what stress.py counts (rows) or rows * columns?
>  If it is rows, then you are still actually reading more columns/s in
> case 2.

Well, unless I'm mistaking, that's the same in my example as I give in
both case
to stress.py the option '-c 1' which tells it to retrieve only one
column each time
even in the case where I have 100 columns by row.

>> And it don't go up significantly after
>> that. Turns out this seems to be a GC problem. Indeed, the info log (I'm
>> running trunk from today, but I first saw the problem on an older version of
>> trunk) show every few seconds lines like:
>>  GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving
>> 1033481216 used; max is 1211498496
>
> First, use the 0.6 branch, not trunk.  we're breaking stuff over there.

Fair enough, I will do the test with 0.6. But again, I saw such
behavior with a trunk
from like 3 weeks ago. Just to say that I don't believe it to be
something broke
recently. But I admit, I should have tried with 0.6 and I will do it.

> What happens if you give the jvm 50% more ram?

A quick test doesn't show the problem with 50% more ram, at least not in a
short time frame. But I'm still not convinced there is no problem, I saw pretty
weird performance with bigger columns. Let me try to come up with a more
compelling test against 0.6. I'll keep you posted, even if I'm wrong :)

> Are you using a 64-bit JVM?

yep

--
Sylvain

Mime
View raw message