cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <>
Subject Re: mmap
Date Thu, 15 Jul 2010 16:41:36 GMT
> Can someone please explain the mmap issue.
> mmap is default for all storage files for 64bit machines.
> according to this case
> it might not be a good thing.
> Is it right to say that you should use mmap only if your MAX expected data
> is smaller then the MIN free RAM that could be in your system?

Not really. That is, the intent of mmap is to let the OS dynamically
choose what gets swapped in and out. The practical problem is that the
OS will often tend to swap too much. I got the impression jbellis
wasn't convinced, but my anecdotal experience is that this is a much
larger problem for mmap():ed data than for regular buffer cached data
- presumably, or so my assumption has been, because in the cache of
the buffer cache the kernel has direct knowledge that it's cache only
while with mmap() it's directly competing with regular application
memory (I haven't actually checked the source; I suppose I should).

One thing you can do is decrease swappiness (assuming Linux; check out
/proc/sys/vm/swappiness) and see if it helps. But in general, you
don't have, to my knowledge, good direct control over swapping

As noted in the thread, the best bet would probably be to make the JVM
use mlock()/mlockall() to guarantee that the JVM doesn't swap anything
out, and then let the OS do it's thing with any remaining data.

That said, certainly if the total amount of data is less than the
minimum free after JVM heap, you're very much less likely to see
swapping. But it's not the intent that you should only use mmap()
under such circumstances.

Also, personally I'm interested in hearing what kind of performance
impacts people have *actually* seen with standard I/O; especially if
cassandra is configured to configure a significant amount of data in
RAM itself. I'm a bit skeptical about claims of extreme performance
differences, in spite of syscalls being expensive.

/ Peter Schuller

View raw message