incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al Tobey ...@ooyala.com>
Subject Re: kswapd0 causing read timeouts
Date Thu, 14 Jun 2012 03:42:38 GMT
I would check /etc/sysctl.conf and get the values of
/proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure.

If you don't have JNA enabled (which Cassandra uses to fadvise) and
swappiness is at its default of 60, the Linux kernel will happily swap out
your heap for cache space.  Set swappiness to 1 or 'swapoff -a' and kswapd
shouldn't be doing much unless you have a too-large heap or some other app
using up memory on the system.

On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov <ruslan.usifov@gmail.com>wrote:

> Hm, it's very strange what amount of you data? You linux kernel
> version? Java version?
>
> PS: i can suggest switch diskaccessmode to standart in you case
> PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32
> (from oracle site)
>
> 2012/6/13 Gurpreet Singh <gurpreet.singh@gmail.com>:
> > Alright, here it goes again...
> > Even with mmap_index_only, once the RES memory hit 15 gigs, the read
> latency
> > went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48
> hrs
> > if its mmap_index_only.
> >
> > only reads happening at 50 reads/second
> > row cache size: 730 mb, row cache hit ratio: 0.75
> > key cache size: 400 mb, key cache hit ratio: 0.4
> > heap size (max 8 gigs): used 6.1-6.9 gigs
> >
> > No messages about reducing cache sizes in the logs
> >
> > stats:
> > vmstat 1 : no swapping here, however high sys cpu utilization
> > iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6,
> util
> > = 15-30%
> > top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb
> > cfstats - 70-100 ms. This number used to be 20-30 ms.
> >
> > The value of the SHR keeps increasing (owing to mmap i guess), while at
> the
> > same time buffers keeps decreasing. buffers starts as high as 50 mb, and
> > goes down to 2 mb.
> >
> >
> > This is very easily reproducible for me. Every time the RES memory hits
> abt
> > 15 gigs, the client starts getting timeouts from cassandra, the sys cpu
> > jumps a lot. All this, even though my row cache hit ratio is almost 0.75.
> >
> > Other than just turning off mmap completely, is there any other solution
> or
> > setting to avoid a cassandra restart every cpl of days. Something to keep
> > the RES memory to hit such a high number. I have been constantly
> monitoring
> > the RES, was not seeing issues when RES was at 14 gigs.
> > /G
> >
> > On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh <
> gurpreet.singh@gmail.com>
> > wrote:
> >>
> >> Aaron, Ruslan,
> >> I changed the disk access mode to mmap_index_only, and it has been
> stable
> >> ever since, well at least for the past 20 hours. Previously, in abt
> 10-12
> >> hours, as soon as the resident memory was full, the client would start
> >> timing out on all its reads. It looks fine for now, i am going to let it
> >> continue to see how long it lasts and if the problem comes again.
> >>
> >> Aaron,
> >> yes, i had turned swap off.
> >>
> >> The total cpu utilization was at 700% roughly.. It looked like kswapd0
> was
> >> using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a
> >> bit. top was reporting high system cpu, and low user cpu.
> >> vmstat was not showing swapping. java heap size max is 8 gigs. while
> only
> >> 4 gigs was in use, so java heap was doing great. no gc in the logs.
> iostat
> >> was doing ok from what i remember, i will have to reproduce the issue
> for
> >> the exact numbers.
> >>
> >> cfstats latency had gone very high, but that is partly due to high cpu
> >> usage.
> >>
> >> One thing was clear, that the SHR was inching higher (due to the mmap)
> >> while buffer cache which started at abt 20-25mb reduced to 2 MB by the
> end,
> >> which probably means that pagecache was being evicted by the kswapd0. Is
> >> there a way to fix the size of the buffer cache and not let system
> evict it
> >> in favour of mmap?
> >>
> >> Also, mmapping data files would basically cause not only the data (asked
> >> for) to be read into main memory, but also a bunch of extra pages
> >> (readahead), which would not be very useful, right? The same thing for
> index
> >> would actually be more useful, as there would be more index entries in
> the
> >> readahead part.. and the index files being small wouldnt cause memory
> >> pressure that page cache would be evicted. mmapping the data files would
> >> make sense if the data size is smaller than the RAM or the hot data set
> is
> >> smaller than the RAM, otherwise just the index would probably be a
> better
> >> thing to mmap, no?. In my case data size is 85 gigs, while available
> RAM is
> >> 16 gigs (only 8 gigs after heap).
> >>
> >> /G
> >>
> >>
> >> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton <aaron@thelastpickle.com>
> >> wrote:
> >>>
> >>> Ruslan,
> >>> Why did you suggest changing the disk_access_mode ?
> >>>
> >>> Gurpreet,
> >>> I would leave the disk_access_mode with the default until you have a
> >>> reason to change it.
> >>>
> >>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
> >>>
> >>> is swap disabled ?
> >>>
> >>>> Gradually,
> >>>> > the system cpu becomes high almost 70%, and the client starts
> getting
> >>>> > continuous timeouts
> >>>
> >>> 70% of one core or 70% of all cores ?
> >>> Check the server logs, is there GC activity ?
> >>> check nodetool cfstats to see the read latency for the cf.
> >>>
> >>> Take a look at vmstat to see if you are swapping, and look at iostats
> to
> >>> see if io is the problem
> >>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
> >>>
> >>> Cheers
> >>>
> >>> -----------------
> >>> Aaron Morton
> >>> Freelance Developer
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>>
> >>> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
> >>>
> >>> Thanks Ruslan.
> >>> I will try the mmap_index_only.
> >>> Is there any guideline as to when to leave it to auto and when to use
> >>> mmap_index_only?
> >>>
> >>> /G
> >>>
> >>> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov <ruslan.usifov@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> disk_access_mode: mmap??
> >>>>
> >>>> set to disk_access_mode: mmap_index_only in cassandra yaml
> >>>>
> >>>> 2012/6/8 Gurpreet Singh <gurpreet.singh@gmail.com>:
> >>>> > Hi,
> >>>> > I am testing cassandra 1.1 on a 1 node cluster.
> >>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured
> >>>> >
> >>>> > cassandra 1.1.1
> >>>> > heap size: 8 gigs
> >>>> > key cache size in mb: 800 (used only 200mb till now)
> >>>> > memtable_total_space_in_mb : 2048
> >>>> >
> >>>> > I am running a read workload.. about 30 reads/second. no writes
at
> >>>> > all.
> >>>> > The system runs fine for roughly 12 hours.
> >>>> >
> >>>> > jconsole shows that my heap size has hardly touched 4 gigs.
> >>>> > top shows -
> >>>> >   SHR increasing slowly from 100 mb to 6.6 gigs in  these 12 hrs
> >>>> >   RES increases slowly from 6 gigs all the way to 15 gigs
> >>>> >   buffers are at a healthy 25 mb at some point and that goes down
> to 2
> >>>> > mb in
> >>>> > these 12 hrs
> >>>> >   VIRT stays at 85 gigs
> >>>> >
> >>>> > I understand that SHR goes up because of mmap, RES goes up because
> it
> >>>> > is
> >>>> > showing SHR value as well.
> >>>> >
> >>>> > After around 10-12 hrs, the cpu utilization of the system starts
> >>>> > increasing,
> >>>> > and i notice that kswapd0 process starts becoming more active.
> >>>> > Gradually,
> >>>> > the system cpu becomes high almost 70%, and the client starts
> getting
> >>>> > continuous timeouts. The fact that the buffers went down from 20
mb
> to
> >>>> > 2 mb
> >>>> > suggests that kswapd0 is probably swapping out the pagecache.
> >>>> >
> >>>> > Is there a way out of this to avoid the kswapd0 starting to do
> things
> >>>> > even
> >>>> > when there is no swap configured?
> >>>> > This is very easily reproducible for me, and would like a way out
of
> >>>> > this
> >>>> > situation. Do i need to adjust vm memory management stuff like
> >>>> > pagecache,
> >>>> > vfs_cache_pressure.. things like that?
> >>>> >
> >>>> > just some extra information, jna is installed, mlockall is
> successful.
> >>>> > there
> >>>> > is no compaction running.
> >>>> > would appreciate any help on this.
> >>>> > Thanks
> >>>> > Gurpreet
> >>>> >
> >>>> >
> >>>
> >>>
> >>>
> >>
> >
>

Mime
View raw message