cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <Koert.Kuip...@diamondnotch.com>
Subject RE: cassandra + avro | python client vs java client
Date Wed, 27 Oct 2010 21:15:54 GMT
It does not have a c extension as far as I know

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Wednesday, October 27, 2010 5:01 PM
To: user
Subject: Re: cassandra + avro | python client vs java client

Does Avro have a Python C extension yet?

If not, 10x is right in line with how much faster I would expect Java
to be than pure Python.

On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers
<Koert.Kuipers@diamondnotch.com> wrote:
> Hey all,
>
> I have Cassandra 0.7 (nightly build from halfway September) running on one
> test machine with the avro interface. The node holds about 16mm values
> across 10k keys.
>
> As a simple test I ran 2 test queries from a client, one query where I ask
> for all columns for 100 keys and one query where I ask all columns for one
> key (which I know to have a lot of columns). I am not using any buffering
> for columns. I ran the tests multiple times to make sure file caching on
> server wouldn't mess up the comparison.
>
>
>
> Using a java client the results are:
>
> *** test1 ***
>
> running test get_range_slices
>
> 2.672 seconds.
>
> 100 keys
>
> 81849 total columns
>
> *** test2 ***
>
> running test multiget_slice
>
> 1.0 seconds.
>
> 1 keys
>
> 36626 total columns
>
>
>
> That's pretty impressive to me. I also later confirmed that with multiple
> nodes the query across multiple keys is much faster. Also using a clientpool
> would probably speed it up more too.
>
>
>
> Then I ran a python client. The results are:
>
> *** test1 ***
>
> client:rpc get_range_slices
>
> client:rpc call took 30.6 seconds
>
> 100 keys
>
> 81849 total columns
>
> *** test2 ***
>
> client:rpc multiget_slice
>
> client:rpc call took 13.9 seconds
>
> 1 keys
>
> 36626 total columns
>
>
>
> So the python client took 11.4 times as long with the first query and 13.9
> times as long with the second query. That is a big difference! I suspect the
> avro deserialization is causing the slowdown (since the rpc call consists of
> contacting the server, retrieving results and deserializing results). Has
> anyone seen a similar performance difference? This would mean that for a
> production system python avro is not acceptable to me at the moment....
>
>
>
> Both client use only the avro library.
>
>
>
> Best, Koert



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Mime
View raw message