incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: cassandra + avro | python client vs java client
Date Wed, 27 Oct 2010 21:21:02 GMT
Then you should use Thrift from Python if you are concerned about
speed.  (I think the speed penalty there is only about 2x w/ the
extension.)

On Wed, Oct 27, 2010 at 4:15 PM, Koert Kuipers
<Koert.Kuipers@diamondnotch.com> wrote:
> It does not have a c extension as far as I know
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Wednesday, October 27, 2010 5:01 PM
> To: user
> Subject: Re: cassandra + avro | python client vs java client
>
> Does Avro have a Python C extension yet?
>
> If not, 10x is right in line with how much faster I would expect Java
> to be than pure Python.
>
> On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers
> <Koert.Kuipers@diamondnotch.com> wrote:
>> Hey all,
>>
>> I have Cassandra 0.7 (nightly build from halfway September) running on one
>> test machine with the avro interface. The node holds about 16mm values
>> across 10k keys.
>>
>> As a simple test I ran 2 test queries from a client, one query where I ask
>> for all columns for 100 keys and one query where I ask all columns for one
>> key (which I know to have a lot of columns). I am not using any buffering
>> for columns. I ran the tests multiple times to make sure file caching on
>> server wouldn't mess up the comparison.
>>
>>
>>
>> Using a java client the results are:
>>
>> *** test1 ***
>>
>> running test get_range_slices
>>
>> 2.672 seconds.
>>
>> 100 keys
>>
>> 81849 total columns
>>
>> *** test2 ***
>>
>> running test multiget_slice
>>
>> 1.0 seconds.
>>
>> 1 keys
>>
>> 36626 total columns
>>
>>
>>
>> That's pretty impressive to me. I also later confirmed that with multiple
>> nodes the query across multiple keys is much faster. Also using a clientpool
>> would probably speed it up more too.
>>
>>
>>
>> Then I ran a python client. The results are:
>>
>> *** test1 ***
>>
>> client:rpc get_range_slices
>>
>> client:rpc call took 30.6 seconds
>>
>> 100 keys
>>
>> 81849 total columns
>>
>> *** test2 ***
>>
>> client:rpc multiget_slice
>>
>> client:rpc call took 13.9 seconds
>>
>> 1 keys
>>
>> 36626 total columns
>>
>>
>>
>> So the python client took 11.4 times as long with the first query and 13.9
>> times as long with the second query. That is a big difference! I suspect the
>> avro deserialization is causing the slowdown (since the rpc call consists of
>> contacting the server, retrieving results and deserializing results). Has
>> anyone seen a similar performance difference? This would mean that for a
>> production system python avro is not acceptable to me at the moment....
>>
>>
>>
>> Both client use only the avro library.
>>
>>
>>
>> Best, Koert
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message