incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Jones <MJo...@imagehawk.com>
Subject RE: Very new user needs some troubleshooting pointers
Date Fri, 09 Apr 2010 16:14:52 GMT
Sounds like we are some experiencing the same problems. (I'm using 0.6RC1) I have a 3 node
cluster with 8GB/machine (dual core CPU).  I'm peaking on inserts at about 6000-7000/second
running 40 threads.  Separate spindles for commitlog and data.....

My read speed is atrocious, 800/sec sustained (starts off at 1800+/second and falls back to
800/sec).  Of course that is only if I read from the "correct" node.  Depending on the moment,
2 of the nodes will return 1-2/second instead of 800, and only one node will return 800/second.
 And if I spread the reads across many nodes, all the performance drops.   nodetool loadbalance
can change which node is the "golden" node, but I don't know why.  I have doubled the # of
concurrent read threads and seen some performance improvement, (that was the last thing I
tried, and eeked out another 150/second)

So much about Cassandra makes we WANT it to work, I mean look at the fact that all nodes are
essentially equal, that it replicates from rack to rack, from DC to DC, now, if I could just
make it perform.

My machines are basically idle (a large amount of IOWait, but the time is spent in the pending
queue, vs the device svctime).  So far I've got little insight into what could be wrong, I've
increased the key cache 10X using JConsole but the hit rate is still at times abysmal.

I'm writing 400-800 byte blobs with an 8 byte key (supercolumn) and a 12 byte "subkey", then
a 5 byte column name, something that would seem to be right up Cassandra's alley.

Right now I'm reworking my test to dump it into MySQL on the same machines, so I can compare
the two for speed, because either I've got crap for hardware, or there is something rotten
in Denmark.

From: Heath Oderman [mailto:heath@526valley.com]
Sent: Friday, April 09, 2010 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Very new user needs some troubleshooting pointers

Thanks for the reply Jonathan!

I started with multi threaded tests, but when my performance was so much slower than my buddy's
I switched to one to try to isolate and identify the differences.  I got tunnel vision and
kept on with the one thread tests.

I'll modify the tests and try again.

Thanks,
Stu

On Fri, Apr 9, 2010 at 11:34 AM, Jonathan Ellis <jbellis@gmail.com<mailto:jbellis@gmail.com>>
wrote:
A single-threaded test is meaningless.  You need a multithreaded (or
multiprocess) benchmark like the one in contrib/py_stress.

Picture worth 1000 words: http://spyced.blogspot.com/2010/01/cassandra-05.html

On Thu, Apr 8, 2010 at 3:59 PM, Heath Oderman <heath@526valley.com<mailto:heath@526valley.com>>
wrote:
> Hi All,
> I'm brand new to Cassandra and know absolutely nothing, so please forgive me
> in advance.
> A friend and I have each setup a few Cassandra stand alone nodes, completely
> default.
> His: Mac OSX Snow Leopard
>      Mac Book Pro
>      Intel Duo Core
>      4GB Ram
>      5400 rpm disk
> Mine: debian 5.x (lenny) with the deb pack from
> http://www.apache.org/dist/cassandra/debian
>      2  Desktops
>      Intel duo core
>      4GB ram
>      7200 sata drives
>     1 blade
>      8gb ram
>      10000 rpm disk
>      dual xeon
>     (i have a windows box too like the 2 desktops)
>
>     (each of those machines is stand alone)
>
> My debian boxes are brand new installs, nothing else running, purely console
> environments, only SSH & Cassandra installed.
> The Cassandra configs are the *default configs* with only 'ListenAddress'
> and 'ThriftAddress' changed to the ext ip for those boxes.
> We generated a C# library with Thrift to connect to these servers.  We wrote
> a simple c# app that loops 10,000 times and does a
>          _client.batch_insert(_keyspace, map.Key.GetValue(o,
> null).ToString(), dict, ConsistencyLevel.ONE);
> "batch_insert" I guess is the key bit up there.
> The reason that I'm writing is that the batch_insert call takes 400,000
> ticks every time it is called when running against the debian boxes.  Any of
> them.
> The result is that 10,000 inserts against his machine takes about 30
> seconds, and it takes about 1 min 45 seconds against any of my servers.
>  (longer against the windows 7 server.)
> The MacBookPro is faster while I would expect to be slower.  (the macbook
> pro is his laptop and he's running mail and all kinds of other stuff
> simultaneously.)
> I'm on a gigabit network, iostat / top / bmon all show that the Cassandra
> server isn't working very hard.
> Performance mon on my windows client show my computer running the loop is
> hardly working.
> I am writing to you to ask where I might go to get information on comparing
> the environments, improving my performance, etc.  I've been googling all day
> and haven't been able to figure anything out.
> If this is the wrong forum, sorry!
> Thanks for any help/suggestions you might have.
> Stu
>
>
>
>


Mime
View raw message