cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <Koert.Kuip...@diamondnotch.com>
Subject cassandra + avro | python client vs java client
Date Wed, 27 Oct 2010 16:59:31 GMT
Hey all,
I have Cassandra 0.7 (nightly build from halfway September) running on one test machine with
the avro interface. The node holds about 16mm values across 10k keys.
As a simple test I ran 2 test queries from a client, one query where I ask for all columns
for 100 keys and one query where I ask all columns for one key (which I know to have a lot
of columns). I am not using any buffering for columns. I ran the tests multiple times to make
sure file caching on server wouldn't mess up the comparison.

Using a java client the results are:
*** test1 ***
running test get_range_slices
2.672 seconds.
100 keys
81849 total columns
*** test2 ***
running test multiget_slice
1.0 seconds.
1 keys
36626 total columns

That's pretty impressive to me. I also later confirmed that with multiple nodes the query
across multiple keys is much faster. Also using a clientpool would probably speed it up more
too.

Then I ran a python client. The results are:
*** test1 ***
client:rpc get_range_slices
client:rpc call took 30.6 seconds
100 keys
81849 total columns
*** test2 ***
client:rpc multiget_slice
client:rpc call took 13.9 seconds
1 keys
36626 total columns

So the python client took 11.4 times as long with the first query and 13.9 times as long with
the second query. That is a big difference! I suspect the avro deserialization is causing
the slowdown (since the rpc call consists of contacting the server, retrieving results and
deserializing results). Has anyone seen a similar performance difference? This would mean
that for a production system python avro is not acceptable to me at the moment....

Both client use only the avro library.

Best, Koert

Mime
View raw message