incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: cassandra + avro | python client vs java client
Date Wed, 27 Oct 2010 21:00:38 GMT
Does Avro have a Python C extension yet?

If not, 10x is right in line with how much faster I would expect Java
to be than pure Python.

On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers
<Koert.Kuipers@diamondnotch.com> wrote:
> Hey all,
>
> I have Cassandra 0.7 (nightly build from halfway September) running on one
> test machine with the avro interface. The node holds about 16mm values
> across 10k keys.
>
> As a simple test I ran 2 test queries from a client, one query where I ask
> for all columns for 100 keys and one query where I ask all columns for one
> key (which I know to have a lot of columns). I am not using any buffering
> for columns. I ran the tests multiple times to make sure file caching on
> server wouldn’t mess up the comparison.
>
>
>
> Using a java client the results are:
>
> *** test1 ***
>
> running test get_range_slices
>
> 2.672 seconds.
>
> 100 keys
>
> 81849 total columns
>
> *** test2 ***
>
> running test multiget_slice
>
> 1.0 seconds.
>
> 1 keys
>
> 36626 total columns
>
>
>
> That’s pretty impressive to me. I also later confirmed that with multiple
> nodes the query across multiple keys is much faster. Also using a clientpool
> would probably speed it up more too.
>
>
>
> Then I ran a python client. The results are:
>
> *** test1 ***
>
> client:rpc get_range_slices
>
> client:rpc call took 30.6 seconds
>
> 100 keys
>
> 81849 total columns
>
> *** test2 ***
>
> client:rpc multiget_slice
>
> client:rpc call took 13.9 seconds
>
> 1 keys
>
> 36626 total columns
>
>
>
> So the python client took 11.4 times as long with the first query and 13.9
> times as long with the second query. That is a big difference! I suspect the
> avro deserialization is causing the slowdown (since the rpc call consists of
> contacting the server, retrieving results and deserializing results). Has
> anyone seen a similar performance difference? This would mean that for a
> production system python avro is not acceptable to me at the moment….
>
>
>
> Both client use only the avro library.
>
>
>
> Best, Koert



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message