cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Schubert Zhang <>
Subject Re: mmap
Date Thu, 15 Jul 2010 16:54:36 GMT
I found, for large dataset, long-term random reading test, the performance
with mmap is very bad.
See the attached chart in

On Fri, Jul 16, 2010 at 12:41 AM, Peter Schuller <> wrote:

> > Can someone please explain the mmap issue.
> > mmap is default for all storage files for 64bit machines.
> > according to this case
> > it might not be a good thing.
> > Is it right to say that you should use mmap only if your MAX expected
> data
> > is smaller then the MIN free RAM that could be in your system?
> Not really. That is, the intent of mmap is to let the OS dynamically
> choose what gets swapped in and out. The practical problem is that the
> OS will often tend to swap too much. I got the impression jbellis
> wasn't convinced, but my anecdotal experience is that this is a much
> larger problem for mmap():ed data than for regular buffer cached data
> - presumably, or so my assumption has been, because in the cache of
> the buffer cache the kernel has direct knowledge that it's cache only
> while with mmap() it's directly competing with regular application
> memory (I haven't actually checked the source; I suppose I should).
> One thing you can do is decrease swappiness (assuming Linux; check out
> /proc/sys/vm/swappiness) and see if it helps. But in general, you
> don't have, to my knowledge, good direct control over swapping
> policies.
> As noted in the thread, the best bet would probably be to make the JVM
> use mlock()/mlockall() to guarantee that the JVM doesn't swap anything
> out, and then let the OS do it's thing with any remaining data.
> That said, certainly if the total amount of data is less than the
> minimum free after JVM heap, you're very much less likely to see
> swapping. But it's not the intent that you should only use mmap()
> under such circumstances.
> Also, personally I'm interested in hearing what kind of performance
> impacts people have *actually* seen with standard I/O; especially if
> cassandra is configured to configure a significant amount of data in
> RAM itself. I'm a bit skeptical about claims of extreme performance
> differences, in spite of syscalls being expensive.
> --
> / Peter Schuller

View raw message