incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Turner <>
Subject Re: Cassandra read optimization
Date Thu, 19 Apr 2012 01:59:43 GMT
On Wed, Apr 18, 2012 at 5:00 PM, Dan Feldman <> wrote:
> Hi all,
> I'm trying to optimize moving data from Cassandra to HDFS using either Ruby
> or Python client. Right now, I'm playing around on my staging server, an 8
> GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for
> now) with ~150k super columns each (I know, I know - super columns are bad).
> Every super column has ~25 columns totaling ~800 bytes per super column.
> I should also mention that currently the database is static - there are no
> writes/updates, only reads.
> Anyways, in my python/ruby scripts, I'm taking slices of 5000 supercolumns
> long from a single row.  It takes 13 seconds with ruby and 8 seconds with
> pycassa to get a single slice. Or, in other words, it's currently reading at
> speeds of less than 500 kB per second. The speed seems to be linear with the
> length of a slice (i.e. 6 seconds for 2500 scs for ruby). If I run nodetool
> cfstats while my script is running, it tells me that my read latency on the
> column family is ~300ms.
> I assume that this is not normal and thus was wondering what parameters I
> could tweak to improve the performance.

Is your client mult-threaded?  The single threaded performance of
Cassandra isn't at all impressive and it really is designed for
dealing with a lot of simultaneous requests.

Aaron Turner         Twitter: @synfinatic - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

View raw message