cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiming Sun <yiming....@gmail.com>
Subject Re: how can we get (a lot) more performance from cassandra
Date Wed, 16 May 2012 20:53:07 GMT
Will do, Oleg.  Again, thanks for the information.

-- Y.

On Wed, May 16, 2012 at 4:44 PM, Oleg Dulin <oleg.dulin@gmail.com> wrote:

> **
>
> Please do keep us posted. We have a somewhat similar Cassandra utilization
> pattern, and I would like to know what your solution is...
>
>
>
> On 2012-05-16 20:38:37 +0000, Yiming Sun said:
>
>
> Thanks Oleg.  Another caveat from our side is, we have a very large data
> space (imaging picking 100 items out of 3 million, the chance of having 2
> items from the same bin is pretty low). We will experiment with row cache,
> and hopefully it will help, not the opposite (the tuning guide says row
> cache could be detrimental in some circumstances).
>
>
> -- Y.
>
>
> On Wed, May 16, 2012 at 4:25 PM, Oleg Dulin <oleg.dulin@gmail.com> wrote:
>
> Indeed. This is how we are trying to solve this problem.
>
>
> Our application has a built-in cache that resembles a supercolumn or
> standardcolumn data structure and has API that resembles a combination of
> Pelops selector and mutator. You can do something like that for Hector.
>
>
> The cache is constrained and uses LRU to purge unused items and keep
> memory usage steady.
>
>
> It is not perfect and we have bugs still but it cuts down on 90% of
> cassandra reads.
>
>
>
> On 2012-05-16 20:07:11 +0000, Mike Peters said:
>
>
> Hi Yiming,
>
>
> Cassandra is optimized for write-heavy environments.
>
>
> If you have a read-heavy application, you shouldn't be running your reads
> through Cassandra.
>
>
> On the bright side - Cassandra read throughput will remain consistent,
> regardless of your volume.  But you are going to have to "wrap" your reads
> with memcache (or redis), so that the bulk of your reads can be served from
> memory.
>
>
>
> Thanks,
>
> Mike Peters
>
>
> On 5/16/2012 3:59 PM, Yiming Sun wrote:
>
> Hello,
>
>
> I asked the question as a follow-up under a different thread, so I figure
> I should ask here instead in case the other one gets buried, and besides, I
> have a little more information.
>
>
> "We find the lack of performance disturbing" as we are only able to get
> about 3-4MB/sec read performance out of Cassandra.
>
>
> We are using cassandra as the backend for an IR repository of digital
> texts. It is a read-mostly repository with occasional writes.  Each row
> represents a book volume, and each column of a row represents a page of the
> volume.  Granted the data size is small -- the average size of a column
> text is 2-3KB, and each row has about 250 columns (varies quite a bit from
> one volume to another).
>
>
> Currently we are running a 3-node cluster, and will soon be upgraded to a
> 6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  All VMs
> use SAN as disk storage.
>
>
> To retrieve a volume, a slice query is used via Hector that specifies the
> row key (the volume), and a list of column keys (pages), and the
> consistency level is set to ONE.  It is typical to retrieve multiple
> volumes per request.
>
>
> The read rate that I have been seeing is about 3-4 MB/sec, and that is
> reading the raw bytes... using string serializer the rate is even lower,
> about 2.2MB/sec.
>
>
> The server log shows the GC ParNew frequently gets longer than 200ms,
> often in the range of 4-5seconds.  But nowhere near 15 seconds (which is an
> indication that JVM heap is being swapped out).
>
>
> Currently we have not added JNA.  From a blog post, it seems JNA is able
> to increase the performance by 13%, and we are hoping to increase the
> performance by something more like 1300% (3-4 MB/sec is just disturbingly
> low).  And we are hesitant to disable swap entirely since one of the nodes
> is running a couple other services
>
>
> Do you have any suggestions on how we may boost the performance?  Thanks!
>
>
> -- Y.
>
>
>

Mime
View raw message