cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiming Sun <yiming....@gmail.com>
Subject Re: how can we get (a lot) more performance from cassandra
Date Thu, 17 May 2012 01:44:38 GMT
Hi Aaron T.,  No, actually we haven't, but this sounds like a good
suggestion.  I can definitely try THIS before jumping into other things
such as enabling row cache etc. Thanks!

-- Y.

On Wed, May 16, 2012 at 9:38 PM, Aaron Turner <synfinatic@gmail.com> wrote:

> On Wed, May 16, 2012 at 12:59 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
> > Hello,
> >
> > I asked the question as a follow-up under a different thread, so I
> figure I
> > should ask here instead in case the other one gets buried, and besides, I
> > have a little more information.
> >
> > "We find the lack of performance disturbing" as we are only able to get
> > about 3-4MB/sec read performance out of Cassandra.
> >
> > We are using cassandra as the backend for an IR repository of digital
> texts.
> > It is a read-mostly repository with occasional writes.  Each row
> represents
> > a book volume, and each column of a row represents a page of the volume.
> >  Granted the data size is small -- the average size of a column text is
> > 2-3KB, and each row has about 250 columns (varies quite a bit from one
> > volume to another).
> >
> > Currently we are running a 3-node cluster, and will soon be upgraded to a
> > 6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  All
> VMs
> > use SAN as disk storage.
> >
> > To retrieve a volume, a slice query is used via Hector that specifies the
> > row key (the volume), and a list of column keys (pages), and the
> consistency
> > level is set to ONE.  It is typical to retrieve multiple volumes per
> > request.
> >
> > The read rate that I have been seeing is about 3-4 MB/sec, and that is
> > reading the raw bytes... using string serializer the rate is even lower,
> > about 2.2MB/sec.
> >
> > The server log shows the GC ParNew frequently gets longer than 200ms,
> often
> > in the range of 4-5seconds.  But nowhere near 15 seconds (which is an
> > indication that JVM heap is being swapped out).
> >
> > Currently we have not added JNA.  From a blog post, it seems JNA is able
> to
> > increase the performance by 13%, and we are hoping to increase the
> > performance by something more like 1300% (3-4 MB/sec is just disturbingly
> > low).  And we are hesitant to disable swap entirely since one of the
> nodes
> > is running a couple other services
> >
> > Do you have any suggestions on how we may boost the performance?  Thanks!
>
> Have you tried using more threads on the client side?  Generally
> speaking, when I need faster read/write performance I look for ways to
> parallelize my requests and it scales pretty much linearly.
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

Mime
View raw message