incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: need some clarification on recommended memory size
Date Thu, 17 May 2012 05:04:48 GMT
> The read rate that I have been seeing is about 3MB/sec, and that is reading the raw bytes...
using string serializer the rate is even lower, about 2.2MB/sec. 
Can we break this down a bit:

Is this a single client ? 
How many columns is it asking for ? 
What sort of query are you sending, slice or named columns? 
From the client side how long is a single read taking ? 
What is the write workload like?  it sounds like it's write once read many. 

Use nodetool cfstats to see what the read latency is on a single node. (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/)
Is there much difference between this and the latency from the client perspective ?



> Using JNA may help, but a blog article seems to say it only increase 13%, which is not
very significant when the base performance is in single-digit MBs.
There are other reasons to have JNA installed: more efficient snapshots and advising the OS
when file operations should not be cached.

>  Our environment is virtualized, and the disks are actually SAN through fiber channels,
so I don't know if that has impact on performance as well.  
memory speed > network speed

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/05/2012, at 12:35 AM, Yiming Sun wrote:

> Thanks Aaron.  The reason I raised the question about memory requirements is because
we are seeing some very low performance on cassandra read.
> 
> We are using cassandra as the backend for an IR repository, and granted the size of each
column is very small (OCRed text).  Each row represents a book volume, and the columns of
the row represent pages of the volume.  The average size of a column text is 2-3KB, and each
row has about 250 columns (varies quite a bit from one volume to another).
> 
> The read rate that I have been seeing is about 3MB/sec, and that is reading the raw bytes...
using string serializer the rate is even lower, about 2.2MB/sec.   To retrieve each volume,
a slice query is used via Hector that specifies the row key (the volume), and a list of column
keys (pages), and the consistency level is set to ONE.  So I am a bit lost in trying to figure
out how to increase the performance.  Using JNA may help, but a blog article seems to say
it only increase 13%, which is not very significant when the base performance is in single-digit
MBs.
> 
> Do you have any suggestions?
> 
> Oh, another thing is you mentioned memory mapped files.  Our environment is virtualized,
and the disks are actually SAN through fiber channels, so I don't know if that has impact
on performance as well.  Would greatly appreciate any help.  Thanks.
> 
> -- Y.
> 
> On Wed, May 16, 2012 at 5:48 AM, aaron morton <aaron@thelastpickle.com> wrote:
> The JVM will not swap out if you have JNA.jar in the path or you have disabled swap on
the machine (the simplest thing to do). 
> 
> Cassandra uses memory mapped file access. If you have 16GB of ram, 8 will go to the JVM
and the rest can be used by the os to cache files. (Plus the off heap stuff)
> 
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/05/2012, at 11:12 AM, Yiming Sun wrote:
> 
>> Thanks Tyler... so my understanding is, even if Cassandra doesn't do off-heap caching,
by having a large-enough memory, it minimize the chance of swapping the java heap to a disk.
 Is that correct?
>> 
>> -- Y.
>> 
>> On Tue, May 15, 2012 at 6:26 PM, Tyler Hobbs <tyler@datastax.com> wrote:
>> On Tue, May 15, 2012 at 3:19 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
>> Hello,
>> 
>> I was reading the Apache Cassandra 1.0 Documentation PDF dated May 10, 2012, and
had some questions on what the recommended memory size is.
>> 
>> Below is the snippet from the PDF.  Bullet 1 suggests to have 16-32GB of RAM, yet
Bullet 2 suggests to limit Java heap size to no more than 8GB.  My understanding is that Cassandra
is implemented purely in Java, so all memory it sees and uses is the JVM Heap.
>> 
>> The main way that additional RAM helps is through the OS page cache, which will store
hot portions of SSTables in memory. Additionally, Cassandra can now do off-heap caching.
>> 
>>  
>>  So can someone help me understand the discrepancy between 16-32GB of RAM and 8GB
of heap?  Thanks.
>> 
>> == snippet ==
>> Memory
>> The more memory a Cassandra node has, the better read performance. More RAM allows
for larger cache sizes and
>> reduces disk I/O for reads. More RAM also allows memory tables (memtables) to hold
more recently written data. Larger
>> memtables lead to a fewer number of SSTables being flushed to disk and fewer files
to scan during a read. The ideal
>> amount of RAM depends on the anticipated size of your hot data.
>> 
>> • For dedicated hardware, a minimum of than 8GB of RAM is needed. DataStax recommends
16GB - 32GB.
>> 
>> • Java heap space should be set to a maximum of 8GB or half of your total RAM,
whichever is lower. (A greater
>> heap size has more intense garbage collection periods.)
>> 
>> • For a virtual environment use a minimum of 4GB, such as Amazon EC2 Large instances.
For production clusters
>> with a healthy amount of traffic, 8GB is more common.
>> 
>> 
>> 
>> -- 
>> Tyler Hobbs
>> DataStax
>> 
>> 
> 
> 


Mime
View raw message