incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <>
Subject Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
Date Thu, 28 Jul 2011 14:03:43 GMT

On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:

> This is not advisable in general, since non-mmap'd I/O is substantially slower.

I see this again and again as a claim here, but it is actually close to 10 years since I saw
mmap'd I/O have any substantial performance benefits on any real life use I have needed.

We have done a lot of testing of this also with cassandra and I don't see anything conclusive.
We have done as many test where normal I/O has been faster than mmap and the differences may
very well be within statistical variances given the complexity and number of factors involved
in something like a distributed cassandra working at quorum.

mmap made a difference in 2000 when memory throughput was still measured in hundreds of megabytes/sec
and cpu caches was a few kilobytes, but today, you got megabytes of CPU caches with 100GB/sec
bandwidths and even memory bandwidths are in 10's of GB/sec.

However, I/O buffers are generally quiet small and copying an I/O  buffer from kernel to user
space inside a cache with 100GB/sec bandwidth is really  a non-issue given the I/O throughput
cassandra generates.

In 2005 or so, CPUs had already reached a limit where I saw that mmap performed worse than
regular I/O on as a large number of use cases. 

Hard to say exactly why, but I saw one theory from a FreeBSD core developer speculating back
then that the extra MMU work involved in some I/O loads may actually be slower than cache
internal memcopy of tiny I/O buffers (they are pretty small after all).

I don't have a personal theory here. I just know that especially on large amounts of smaller
I/O operations regular I/O was typically faster than mmap, which could back up that theory.

So, I wonder how people came to this conclusion as I am, under no real life use case with
cassandra, able to reproduce anything resembling a significant difference and we have been
benchmarking on nodes with ssd setups which can churn out 1GB/sec+ read speeds. 

Way more I/O throughput than most people have at hand and still I cannot get mmap to give
me better performance.

I do, although subjectively, feel that things just seem to work better with regular I/O for
us. We have currently have very nice and stable heap sizes at regardless of I/O loads and
we have an easier system to operate as we can actually monitor how much memory the darned
thing work.

My recommendation? Stay away from mmap.

I would love to understand how people got to this conclusion however and try to find out why
we seem to see differences!

> The OP is correct that it is best to disable swap entirely, and
> second-best to enable JNA for mlockall.

Be a bit careful with removing swap completely. Linux is not always happy when it gets short
on memory.

View raw message