cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurpreet Singh <gurpreet.si...@gmail.com>
Subject Re: kswapd0 causing read timeouts
Date Mon, 18 Jun 2012 18:55:14 GMT
Thanks Aaron.
Sure.. latency numbers are as follows.

cfstats

disk access mode:

mmap: 70-80 ms - This jumps to more than 100 ms once the RES memory hits 16
gig. i have 16 gig RAM.
standard: 35-40 ms

I am running a workload of 100 reads/second. 6 threads throttled to doing
reads at 10 requests/second. Each request has anywhere from 1 to 20 keys
requested (avg 10).

On the client side, i measure:

mmap: 23 ms / key
standard: 10 ms / key

This is with no caching enabled at all in both cases. Clearly here,
standard is twice as fast as mmap for me.

In mmap mode, the SHR and RES keep increasing because of mmap. And once the
RES hits close to 16 gig (RAM size), the cfstats latency increases to more
than 100 ms.
In standard mode, i ran it for 3 days ,and the RES memory did not change, a
very very stable experience.

I will do the jvm updates as suggested, kernel updates are going to be
slower to happen.

Regarding the timeouts, i think they are client side socket timeouts. which
is set to 1 second. For me, its just an indication that the server just
cannot keep up with the load anymore once RES memory hits max. It was able
to do the same load easily till then though slower than in standard mode.

As of now, i will do jvm updates, but looks like i will stick to standard
mode unless there is another suggestion to try out some other setting.

Thanks
Gurpreet

On Sun, Jun 17, 2012 at 8:19 PM, aaron morton <aaron@thelastpickle.com>wrote:

> Can you provide some numbers to explain what is happening with the latency
> ?  Either through monitoring or using nodetool cfstats, call it twice to
> get the most recent latency per CF. e.g. it starts at 2 ms and ends up at
> 15 seconds. How much slower are we talking about ?
>
> When you say the clients are getting timeouts I'm imagining thrift
> TimedOutExceptions, which occur when rpc_timeout has expired. By default
> this is 10 seconds. But they can also be client side socket timeouts which
> can be a lot shorter. Which are you getting ?
>
>
> +1 for upgrading JVM
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/06/2012, at 8:31 AM, ruslan usifov wrote:
>
> Soory i mistaken,here is right string
>
> INFO [main] 2012-06-14 02:03:14,520 CLibrary.java (line 109) JNA
> mlockall successful
>
>
>
>
> 2012/6/15 ruslan usifov <ruslan.usifov@gmail.com>:
>
> 2012/6/14 Gurpreet Singh <gurpreet.singh@gmail.com>:
>
> JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions
>
> on this..
>
> 1. Is there a way to find out if mlockall really worked other than just the
>
> mlockall successful log message?
>
> yes you must see something like this (from our test server):
>
>
>  INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line
>
> 233) Global memtable threshold is enabled at 512MB
>
>
>
> 2. Does cassandra only mlock the jvm heap or also the mmaped memory?
>
>
> Cassandra obviously mlock only heap, and doesn't mmaped sstables
>
>
>
>
> I disabled mmap completely, and things look so much better.
>
> latency is surprisingly half of what i see when i have mmap enabled.
>
> Its funny that i keep reading tall claims abt mmap, but in practise a lot
> of
>
> ppl have problems with it, especially when it uses up all the memory. We
>
> have tried mmap for different purposes in our company before,and had
> finally
>
> ended up disabling it, because it just doesnt handle things right when
>
> memory is low. Maybe the proc/sys/vm needs to be configured right, but
> thats
>
> not the easiest of configurations to get right.
>
>
> Right now, i am handling only 80 gigs of data. kernel version is 2.6.26.
>
> java version is 1.6.21
>
> /G
>
>
>
> On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey <al@ooyala.com> wrote:
>
>
> I would check /etc/sysctl.conf and get the values of
>
> /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure.
>
>
> If you don't have JNA enabled (which Cassandra uses to fadvise) and
>
> swappiness is at its default of 60, the Linux kernel will happily swap out
>
> your heap for cache space.  Set swappiness to 1 or 'swapoff -a' and kswapd
>
> shouldn't be doing much unless you have a too-large heap or some other app
>
> using up memory on the system.
>
>
>
> On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov <ruslan.usifov@gmail.com>
>
> wrote:
>
>
> Hm, it's very strange what amount of you data? You linux kernel
>
> version? Java version?
>
>
> PS: i can suggest switch diskaccessmode to standart in you case
>
> PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32
>
> (from oracle site)
>
>
> 2012/6/13 Gurpreet Singh <gurpreet.singh@gmail.com>:
>
> Alright, here it goes again...
>
> Even with mmap_index_only, once the RES memory hit 15 gigs, the read
>
> latency
>
> went berserk. This happens in 12 hours if diskaccessmode is mmap, abt
>
> 48 hrs
>
> if its mmap_index_only.
>
>
> only reads happening at 50 reads/second
>
> row cache size: 730 mb, row cache hit ratio: 0.75
>
> key cache size: 400 mb, key cache hit ratio: 0.4
>
> heap size (max 8 gigs): used 6.1-6.9 gigs
>
>
> No messages about reducing cache sizes in the logs
>
>
> stats:
>
> vmstat 1 : no swapping here, however high sys cpu utilization
>
> iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6,
>
> util
>
> = 15-30%
>
> top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb
>
> cfstats - 70-100 ms. This number used to be 20-30 ms.
>
>
> The value of the SHR keeps increasing (owing to mmap i guess), while at
>
> the
>
> same time buffers keeps decreasing. buffers starts as high as 50 mb,
>
> and
>
> goes down to 2 mb.
>
>
>
> This is very easily reproducible for me. Every time the RES memory hits
>
> abt
>
> 15 gigs, the client starts getting timeouts from cassandra, the sys cpu
>
> jumps a lot. All this, even though my row cache hit ratio is almost
>
> 0.75.
>
>
> Other than just turning off mmap completely, is there any other
>
> solution or
>
> setting to avoid a cassandra restart every cpl of days. Something to
>
> keep
>
> the RES memory to hit such a high number. I have been constantly
>
> monitoring
>
> the RES, was not seeing issues when RES was at 14 gigs.
>
> /G
>
>
> On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh
>
> <gurpreet.singh@gmail.com>
>
> wrote:
>
>
> Aaron, Ruslan,
>
> I changed the disk access mode to mmap_index_only, and it has been
>
> stable
>
> ever since, well at least for the past 20 hours. Previously, in abt
>
> 10-12
>
> hours, as soon as the resident memory was full, the client would start
>
> timing out on all its reads. It looks fine for now, i am going to let
>
> it
>
> continue to see how long it lasts and if the problem comes again.
>
>
> Aaron,
>
> yes, i had turned swap off.
>
>
> The total cpu utilization was at 700% roughly.. It looked like kswapd0
>
> was
>
> using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite
>
> a
>
> bit. top was reporting high system cpu, and low user cpu.
>
> vmstat was not showing swapping. java heap size max is 8 gigs. while
>
> only
>
> 4 gigs was in use, so java heap was doing great. no gc in the logs.
>
> iostat
>
> was doing ok from what i remember, i will have to reproduce the issue
>
> for
>
> the exact numbers.
>
>
> cfstats latency had gone very high, but that is partly due to high cpu
>
> usage.
>
>
> One thing was clear, that the SHR was inching higher (due to the mmap)
>
> while buffer cache which started at abt 20-25mb reduced to 2 MB by the
>
> end,
>
> which probably means that pagecache was being evicted by the kswapd0.
>
> Is
>
> there a way to fix the size of the buffer cache and not let system
>
> evict it
>
> in favour of mmap?
>
>
> Also, mmapping data files would basically cause not only the data
>
> (asked
>
> for) to be read into main memory, but also a bunch of extra pages
>
> (readahead), which would not be very useful, right? The same thing for
>
> index
>
> would actually be more useful, as there would be more index entries in
>
> the
>
> readahead part.. and the index files being small wouldnt cause memory
>
> pressure that page cache would be evicted. mmapping the data files
>
> would
>
> make sense if the data size is smaller than the RAM or the hot data
>
> set is
>
> smaller than the RAM, otherwise just the index would probably be a
>
> better
>
> thing to mmap, no?. In my case data size is 85 gigs, while available
>
> RAM is
>
> 16 gigs (only 8 gigs after heap).
>
>
> /G
>
>
>
> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton
>
> <aaron@thelastpickle.com>
>
> wrote:
>
>
> Ruslan,
>
> Why did you suggest changing the disk_access_mode ?
>
>
> Gurpreet,
>
> I would leave the disk_access_mode with the default until you have a
>
> reason to change it.
>
>
> 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>
>
> is swap disabled ?
>
>
> Gradually,
>
> the system cpu becomes high almost 70%, and the client starts
>
> getting
>
> continuous timeouts
>
>
> 70% of one core or 70% of all cores ?
>
> Check the server logs, is there GC activity ?
>
> check nodetool cfstats to see the read latency for the cf.
>
>
> Take a look at vmstat to see if you are swapping, and look at iostats
>
> to
>
> see if io is the problem
>
> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
>
>
> Cheers
>
>
> -----------------
>
> Aaron Morton
>
> Freelance Developer
>
> @aaronmorton
>
> http://www.thelastpickle.com
>
>
> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
>
>
> Thanks Ruslan.
>
> I will try the mmap_index_only.
>
> Is there any guideline as to when to leave it to auto and when to use
>
> mmap_index_only?
>
>
> /G
>
>
> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov
>
> <ruslan.usifov@gmail.com>
>
> wrote:
>
>
> disk_access_mode: mmap??
>
>
> set to disk_access_mode: mmap_index_only in cassandra yaml
>
>
> 2012/6/8 Gurpreet Singh <gurpreet.singh@gmail.com>:
>
> Hi,
>
> I am testing cassandra 1.1 on a 1 node cluster.
>
> 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>
>
> cassandra 1.1.1
>
> heap size: 8 gigs
>
> key cache size in mb: 800 (used only 200mb till now)
>
> memtable_total_space_in_mb : 2048
>
>
> I am running a read workload.. about 30 reads/second. no writes at
>
> all.
>
> The system runs fine for roughly 12 hours.
>
>
> jconsole shows that my heap size has hardly touched 4 gigs.
>
> top shows -
>
>   SHR increasing slowly from 100 mb to 6.6 gigs in  these 12 hrs
>
>   RES increases slowly from 6 gigs all the way to 15 gigs
>
>   buffers are at a healthy 25 mb at some point and that goes down
>
> to 2
>
> mb in
>
> these 12 hrs
>
>   VIRT stays at 85 gigs
>
>
> I understand that SHR goes up because of mmap, RES goes up because
>
> it
>
> is
>
> showing SHR value as well.
>
>
> After around 10-12 hrs, the cpu utilization of the system starts
>
> increasing,
>
> and i notice that kswapd0 process starts becoming more active.
>
> Gradually,
>
> the system cpu becomes high almost 70%, and the client starts
>
> getting
>
> continuous timeouts. The fact that the buffers went down from 20
>
> mb to
>
> 2 mb
>
> suggests that kswapd0 is probably swapping out the pagecache.
>
>
> Is there a way out of this to avoid the kswapd0 starting to do
>
> things
>
> even
>
> when there is no swap configured?
>
> This is very easily reproducible for me, and would like a way out
>
> of
>
> this
>
> situation. Do i need to adjust vm memory management stuff like
>
> pagecache,
>
> vfs_cache_pressure.. things like that?
>
>
> just some extra information, jna is installed, mlockall is
>
> successful.
>
> there
>
> is no compaction running.
>
> would appreciate any help on this.
>
> Thanks
>
> Gurpreet
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message