cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Feldman <>
Subject Cassandra read optimization
Date Thu, 19 Apr 2012 00:00:45 GMT
Hi all,

I'm trying to optimize moving data from Cassandra to HDFS using either Ruby
or Python client. Right now, I'm playing around on my staging server, an 8
GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for
now) with ~150k super columns each (I know, I know - super columns are
bad). Every super column has ~25 columns totaling ~800 bytes per super

I should also mention that currently the database is static - there are no
writes/updates, only reads.

Anyways, in my python/ruby scripts, I'm taking slices of 5000 supercolumns
long from a single row.  It takes 13 seconds with ruby and 8 seconds with
pycassa to get a single slice. Or, in other words, it's currently reading
at speeds of less than 500 kB per second. The speed seems to be linear with
the length of a slice (i.e. 6 seconds for 2500 scs for ruby). If I run
nodetool cfstats while my script is running, it tells me that my read
latency on the column family is ~300ms.

I assume that this is not normal and thus was wondering what parameters I
could tweak to improve the performance.

Dan F.

View raw message