incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyusik Chung <kyu...@discovereads.com>
Subject Re: performance tuning - where does the slowness come from?
Date Thu, 06 May 2010 18:56:01 GMT
Id like to add one caveat to Weijun's statement.  I agree with everything, except if your access
pattern doesnt look like a random sampling of data across all your sstables.  If it turns
out that at any given time, you're doing many repeated hits to a smaller subset of keys, then
using mmap even if your live sstables are much larger than available memory should be ok.
 The key is to have enough memory available (pre-mmap) so that there are few page-in operations
relative to client read requests.

Also, I suppose if you dont have a lot of repeat hits per key, mmap prob doesnt buy you a
ton either, unless your rows are very skinny and lots of them fit in a page - as far as I
can tell, linux lazily pages in data thats been mmap-ed.

(apologies for describing mmap inaccurately earlier in the thread)

Kyusik Chung

On May 6, 2010, at 11:05 AM, Weijun Li wrote:

> I just used Linux "Top" to see the number of virtual memory used by JVM. When you turned
on mmap, this number is equal to the size of your live sstables. And if you turn off mmap
the VIRT will be close to the xmx of your jvm.
> 
> Anyway, for mmap, in order for you to access the data in the buffer or virtual address,
OS has to read/page in the data to a block of physical memory and assign your virtual address
to that physical memory block. So if you use random partitioner you'll most likely force Linux
to page in/out all the time. In this case, disabling mmap and let Cassandra to use random
file access seems to make more sense. mmap should be used when you have enough ram for OS
to cache most or all of your data files.
> 
> -Weijun
> 
> On Thu, May 6, 2010 at 10:49 AM, Vick Khera <vivek@khera.org> wrote:
> On Thu, May 6, 2010 at 1:06 PM, Weijun Li <weijunli@gmail.com> wrote:
> > In this case using mmap will cause Cassandra to use sometimes > 100G virtual
> > memory which is much more than the physical ram, since we are using random
> > partitioner the OS will be busy doing swap.
> 
> mmap uses the virtual address space to reference bits on the disk; it
> does *NOT* use physical or virtual memory to copy that data other than
> perhaps any disk buffer cache from reading the file (which you would
> have anyhow).  Your memory usage tools will report high memory usage
> because they tell you how much virtual address space you have
> allocated.
> 


Mime
View raw message