incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
Date Tue, 09 Mar 2010 13:52:22 GMT
On Tue, Mar 9, 2010 at 7:15 AM, Sylvain Lebresne <sylvain@yakaz.com> wrote:
>  1) stress.py -t 10 -o read -n 50000000 -c 1 -r
>  2) stress.py -t 10 -o read -n 500000 -c 1 -r
>
> In the case 1) I get around 200 reads/seconds and that's pretty stable. The
> disk is spinning like crazy (~25% io_wait), very few cpu or memory used,
> performances are IO bound, which is expected.
> In the case 2) however, it starts with reasonnable performance (400+
> reads/seconds), but it very quickly drop to an average of 80 reads/seconds

By "reads" do you mean what stress.py counts (rows) or rows * columns?
 If it is rows, then you are still actually reading more columns/s in
case 2.

> And it don't go up significantly after
> that. Turns out this seems to be a GC problem. Indeed, the info log (I'm
> running trunk from today, but I first saw the problem on an older version of
> trunk) show every few seconds lines like:
>  GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving
> 1033481216 used; max is 1211498496

First, use the 0.6 branch, not trunk.  we're breaking stuff over there.

What happens if you give the jvm 50% more ram?

Are you using a 64-bit JVM?

-Jonathan

Mime
View raw message