cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Dulin <oleg.du...@gmail.com>
Subject Re: how can we get (a lot) more performance from cassandra
Date Wed, 16 May 2012 20:44:04 GMT
Please do keep us posted. We have a somewhat similar Cassandra 
utilization pattern, and I would like to know what your solution is...


On 2012-05-16 20:38:37 +0000, Yiming Sun said:

> Thanks Oleg.  Another caveat from our side is, we have a very large 
> data space (imaging picking 100 items out of 3 million, the chance of 
> having 2 items from the same bin is pretty low). We will experiment 
> with row cache, and hopefully it will help, not the opposite (the 
> tuning guide says row cache could be detrimental in some circumstances).
> 
> -- Y.
> 
> On Wed, May 16, 2012 at 4:25 PM, Oleg Dulin <oleg.dulin@gmail.com> wrote:
> Indeed. This is how we are trying to solve this problem.
> 
> Our application has a built-in cache that resembles a supercolumn or 
> standardcolumn data structure and has API that resembles a combination 
> of Pelops selector and mutator. You can do something like that for 
> Hector.
> 
> The cache is constrained and uses LRU to purge unused items and keep 
> memory usage steady.
> 
> It is not perfect and we have bugs still but it cuts down on 90% of 
> cassandra reads.
> 
> 
> On 2012-05-16 20:07:11 +0000, Mike Peters said:
> 
> Hi Yiming,
> 
> Cassandra is optimized for write-heavy environments.
> 
> If you have a read-heavy application, you shouldn't be running your 
> reads through Cassandra.
> 
> On the bright side - Cassandra read throughput will remain consistent, 
> regardless of your volume.  But you are going to have to "wrap" your 
> reads with memcache (or redis), so that the bulk of your reads can be 
> served from memory.
> 
> 
> Thanks,
> Mike Peters
> 
> On 5/16/2012 3:59 PM, Yiming Sun wrote:
> Hello,
> 
> I asked the question as a follow-up under a different thread, so I 
> figure I should ask here instead in case the other one gets buried, and 
> besides, I have a little more information.
> 
> "We find the lack of performance disturbing" as we are only able to get 
> about 3-4MB/sec read performance out of Cassandra.
> 
> We are using cassandra as the backend for an IR repository of digital 
> texts. It is a read-mostly repository with occasional writes.  Each row 
> represents a book volume, and each column of a row represents a page of 
> the volume.  Granted the data size is small -- the average size of a 
> column text is 2-3KB, and each row has about 250 columns (varies quite 
> a bit from one volume to another).
> 
> Currently we are running a 3-node cluster, and will soon be upgraded to 
> a 6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  
> All VMs use SAN as disk storage.  
> 
> To retrieve a volume, a slice query is used via Hector that specifies 
> the row key (the volume), and a list of column keys (pages), and the 
> consistency level is set to ONE.  It is typical to retrieve multiple 
> volumes per request.
> 
> The read rate that I have been seeing is about 3-4 MB/sec, and that is 
> reading the raw bytes... using string serializer the rate is even 
> lower, about 2.2MB/sec.  
> 
> The server log shows the GC ParNew frequently gets longer than 200ms, 
> often in the range of 4-5seconds.  But nowhere near 15 seconds (which 
> is an indication that JVM heap is being swapped out).
> 
> Currently we have not added JNA.  From a blog post, it seems JNA is 
> able to increase the performance by 13%, and we are hoping to increase 
> the performance by something more like 1300% (3-4 MB/sec is just 
> disturbingly low).  And we are hesitant to disable swap entirely since 
> one of the nodes is running a couple other services
> 
> Do you have any suggestions on how we may boost the performance?  Thanks!
> 
> -- Y.

Mime
View raw message