cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Cassandra Write Performance, CPU usage
Date Fri, 11 Jun 2010 19:22:39 GMT
yes, it is expected that writes are cpu-bound.

On Fri, Jun 11, 2010 at 11:29 AM, Rishi Bhardwaj <khichrishi@yahoo.com> wrote:
> I think it would be a good exercise to know what the CPU bottleneck is on
> the write path. The fact that Cassandra optimizes disk I/O for writes would
> only go so far if the CPU becomes a big bottleneck on continuous writes. I
> am fairly new to Java ecosystem performance profiling but I would give it a
> try and see if I can pinpoint the problem area here. I am also thinking
> about making concurrent writes to cassandra instead of only one write at a
> time. This would probably make Cassandra beat the hell out of all CPU
> resources and confirm that Cassandra is CPU bound on continuous writes.
> Again, I would love to hear from Cassandra experts here and see what they
> think of this. Are Cassandra continuous bulk writes expected to be
> bottlenecked by CPU? If this is definitely the case and thats what it seems
> right now, then it would be a good thing to look at the algorithms in the
> write path.
> Thanks,
> Rishi
> ________________________________
> From: Mike Malone <mike@simplegeo.com>
> To: user@cassandra.apache.org
> Sent: Fri, June 11, 2010 9:20:06 AM
> Subject: Re: Cassandra Write Performance, CPU usage
>
> Jonathan, while I agree with you re: this being an unusual load for the
> system, it is interesting that he's found at least one use-case where
> Cassandra is CPU-bound, not IO-bound. I'd definitely be interested in
> learning what his critical path is and seeing if there's some low-hanging
> fruit that may improve performance overall. I have also noticed very high
> CPU usage during high write loads and have wondered whether write speed and
> throughput could be improved by improving some of the algorithms along that
> path.
> I'm nowhere near being an expert on the whole Java ecosystem, but I've had
> good luck with the `jvisualvm` tool that comes with Java SE 6. It's a nice
> lightweight CPU and memory profiling tool that can attach to a running
> process like Cassandra and dump stats in real time.
> Mike
>
> On Thu, Jun 10, 2010 at 7:39 PM, Jonathan Shook <jshook@gmail.com> wrote:
>>
>> You are testing Cassandra in a way that it was not designed to be used.
>> Bandwidth to disk is not a meaningful example for nearly anything
>> except for filesystem benchmarking and things very nearly the same as
>> filesystem benchmarking.
>> Unless the usage patterns of your application match your test data,
>> there is not a good reason to expect a strong correlation between this
>> test and actual performance.
>>
>> Cassandra is not simply shuffling data through IO when you write.
>> There are calculations that have to be done as writes filter their way
>> through various stages of processing. The point of this is to minimize
>> the overall effort Cassandra has to make in order to retrieve the data
>> again. One example would be bloom filters. Each column that is written
>> requires bloom filter processing and potentially auxiliary IO. Some of
>> these steps are allowed to happen in the background, but if you try,
>> you can cause them to stack up on top of the available CPU and memory
>> resources.
>>
>> In such a case (continuous bulk writes), you are causing all of these
>> costs to be taken in more of a synchronous (not delayed) fashion. You
>> are not allowing the background processing that helps reduce client
>> blocking (by deferring some processing) to do its magic.
>>
>>
>>
>> On Thu, Jun 10, 2010 at 7:42 PM, Rishi Bhardwaj <khichrishi@yahoo.com>
>> wrote:
>> > Hi
>> > I am investigating Cassandra write performance and see very heavy CPU
>> > usage
>> > from Cassandra. I have a single node Cassandra instance running on a
>> > dual
>> > core (2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are
>> > being
>> > generated from the same server using BatchMutate(). The client makes
>> > exactly
>> > one RPC call at a time to Cassandra. Each BatchMutate() RPC contains 2
>> > MB of
>> > data and once it is acknowledged by Cassandra, the next RPC is done.
>> > Cassandra has two separate disks, one for commitlog with a sequential
>> > b/w of
>> > 130MBps and the other a solid state disk for data with b/w of 90MBps.
>> > Tuning
>> > various parameters, I observe that I am able to attain a maximum write
>> > performance of about 45 to 50 MBps from Cassandra. I see that the
>> > Cassandra
>> > java process consistently uses 100% to 150% of CPU resources (as shown
>> > by
>> > top) during the entire write operation. Also, iostat clearly shows that
>> > the
>> > max disk bandwidth is not reached anytime during the write operation,
>> > every
>> > now and then the i/o activity on "commitlog" disk and the data disk
>> > spike
>> > but it is never consistently maintained by cassandra close to their
>> > peak. I
>> > would imagine that the CPU is probably the bottleneck here. Does anyone
>> > have
>> > any idea why Cassandra beats the heck out of the CPU here? Any
>> > suggestions
>> > on how to go about finding the exact bottleneck here?
>> > Some more information about the writes: I have 2 column families, the
>> > data
>> > though is mostly written in one column family with column sizes of
>> > around
>> > 32k and each row having around 256 or 512 columns. I would really
>> > appreciate
>> > any help here.
>> > Thanks,
>> > Rishi
>> >
>> >
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message