incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Simon <jsi...@jules.com.au>
Subject Re: Poor performance; PHP & Thrift to blame
Date Tue, 30 Mar 2010 11:21:22 GMT
Yes I tested it with and without APC - it had a negligible impact on
performance.

This didn't surprise me - most of the optimization that APC offers is
in the parsing of PHP code; seeing as the benchmark is a single PHP
process the code parsing overhead occurs outside the benchmark loop.

Does anyone have any benchmarks for larger Cassandra queries from PHP
similar to what I'm trying to do?  The performance bottlenecks don't
show up on 1,5,10, or even 100 column query sets - only for larger
sets or query loops.

Anyone doing time series analysis?  This is the sort of use case where
I'd expect to see much larger query sets.

I suppose Facebook and Digg are only pulling out small column sets, so
they wouldn't necessarily notice this issue.



On Tue, Mar 30, 2010 at 8:00 PM, David Timothy Strauss
<david@fourkitchens.com> wrote:
> Without APC, there should be even more of an improvement with the Thrift PHP extension.
>
> ----- "Rauan Maemirov" <rauan@maemirov.com> wrote:
>
>> What about APC? Did you turn it on?
>>
>> 2010/3/30 Julian Simon <jsimon@jules.com.au>:
>> > Hi,
>> >
>> > I've been trying to benchmark Cassandra for our use case and have
>> been
>> > seeing poor performance on both writes and (extremely) poor
>> > performance on reads.
>> >
>> > Using Cassandra 0.51 stable & thrift-0.2.0.
>> >
>> > It turns out all the CPU time is going to the PHP client process -
>> the
>> > JVM operating the Cassandra server isn't breaking much of a sweat.
>> >
>> > For reads the latency is often up to 1 second to fetch a row
>> > containing ~2000 columns, or around 300ms to fetch a 500-column
>> wide
>> > row.  This is with get_slice(), and a predicate specifying the start
>> &
>> > finish range.
>> >
>> > Using cachegrind and inspecting the code inside the Thrift bindings
>> > makes it pretty clear why the performance is so bad, particularly
>> on
>> > reads. The biggest culprit is the translation code which casts data
>> > back and forth into binary representations for sending over the
>> wire
>> > to the Cassandra server.
>> >
>> > There seems to be some 32-bit specific code which iterates heavily
>> > apparently due to a limitation in PHPs implementation of LONGs.
>> >
>> > However, testing on a 64-bit host doesn't yield any performance
>> improvement.
>> >
>> > More surprisingly, if I compile and enable the PHP native thrift
>> > bindings (following this guide
>> > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
>> > read performance actually degrades by another 50%.  I have verified
>> > that the Thrift code is recognizing and using the native PHP
>> functions
>> > provided by the library.
>> >
>> > I've tested all of this on both 32-bit and 64-bit installations of
>> > both PHP 5.1 & 5.2.  Results are the same in all cases.
>> >
>> > My environment is on vanilla CentOS 5.4 server installations inside
>> > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>> >
>> > Has anyone been able to produce decent performance with PHP &
>> > Cassandra?  If so, how have you done it?
>> >
>> > Thanks,
>> > Jules
>> >
>
> --
> David Strauss
>   | david@fourkitchens.com
>   | +1 512 577 5827 [mobile]
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>

Mime
View raw message