incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: kswapd0 causing read timeouts
Date Mon, 18 Jun 2012 03:19:24 GMT
Can you provide some numbers to explain what is happening with the latency ?  Either through
monitoring or using nodetool cfstats, call it twice to get the most recent latency per CF.
e.g. it starts at 2 ms and ends up at 15 seconds. How much slower are we talking about ?

When you say the clients are getting timeouts I'm imagining thrift TimedOutExceptions, which
occur when rpc_timeout has expired. By default this is 10 seconds. But they can also be client
side socket timeouts which can be a lot shorter. Which are you getting ? 


+1 for upgrading JVM

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 8:31 AM, ruslan usifov wrote:

> Soory i mistaken,here is right string
> 
> INFO [main] 2012-06-14 02:03:14,520 CLibrary.java (line 109) JNA
> mlockall successful
> 
> 
> 
> 
> 2012/6/15 ruslan usifov <ruslan.usifov@gmail.com>:
>> 2012/6/14 Gurpreet Singh <gurpreet.singh@gmail.com>:
>>> JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions
>>> on this..
>>> 1. Is there a way to find out if mlockall really worked other than just the
>>> mlockall successful log message?
>> yes you must see something like this (from our test server):
>> 
>>  INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line
>> 233) Global memtable threshold is enabled at 512MB
>> 
>> 
>>> 2. Does cassandra only mlock the jvm heap or also the mmaped memory?
>> 
>> Cassandra obviously mlock only heap, and doesn't mmaped sstables
>> 
>> 
>>> 
>>> I disabled mmap completely, and things look so much better.
>>> latency is surprisingly half of what i see when i have mmap enabled.
>>> Its funny that i keep reading tall claims abt mmap, but in practise a lot of
>>> ppl have problems with it, especially when it uses up all the memory. We
>>> have tried mmap for different purposes in our company before,and had finally
>>> ended up disabling it, because it just doesnt handle things right when
>>> memory is low. Maybe the proc/sys/vm needs to be configured right, but thats
>>> not the easiest of configurations to get right.
>>> 
>>> Right now, i am handling only 80 gigs of data. kernel version is 2.6.26.
>>> java version is 1.6.21
>>> /G
>>> 
>>> 
>>> On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey <al@ooyala.com> wrote:
>>>> 
>>>> I would check /etc/sysctl.conf and get the values of
>>>> /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure.
>>>> 
>>>> If you don't have JNA enabled (which Cassandra uses to fadvise) and
>>>> swappiness is at its default of 60, the Linux kernel will happily swap out
>>>> your heap for cache space.  Set swappiness to 1 or 'swapoff -a' and kswapd
>>>> shouldn't be doing much unless you have a too-large heap or some other app
>>>> using up memory on the system.
>>>> 
>>>> 
>>>> On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov <ruslan.usifov@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hm, it's very strange what amount of you data? You linux kernel
>>>>> version? Java version?
>>>>> 
>>>>> PS: i can suggest switch diskaccessmode to standart in you case
>>>>> PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32
>>>>> (from oracle site)
>>>>> 
>>>>> 2012/6/13 Gurpreet Singh <gurpreet.singh@gmail.com>:
>>>>>> Alright, here it goes again...
>>>>>> Even with mmap_index_only, once the RES memory hit 15 gigs, the read
>>>>>> latency
>>>>>> went berserk. This happens in 12 hours if diskaccessmode is mmap,
abt
>>>>>> 48 hrs
>>>>>> if its mmap_index_only.
>>>>>> 
>>>>>> only reads happening at 50 reads/second
>>>>>> row cache size: 730 mb, row cache hit ratio: 0.75
>>>>>> key cache size: 400 mb, key cache hit ratio: 0.4
>>>>>> heap size (max 8 gigs): used 6.1-6.9 gigs
>>>>>> 
>>>>>> No messages about reducing cache sizes in the logs
>>>>>> 
>>>>>> stats:
>>>>>> vmstat 1 : no swapping here, however high sys cpu utilization
>>>>>> iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time
= 0.6,
>>>>>> util
>>>>>> = 15-30%
>>>>>> top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb
>>>>>> cfstats - 70-100 ms. This number used to be 20-30 ms.
>>>>>> 
>>>>>> The value of the SHR keeps increasing (owing to mmap i guess), while
at
>>>>>> the
>>>>>> same time buffers keeps decreasing. buffers starts as high as 50
mb,
>>>>>> and
>>>>>> goes down to 2 mb.
>>>>>> 
>>>>>> 
>>>>>> This is very easily reproducible for me. Every time the RES memory
hits
>>>>>> abt
>>>>>> 15 gigs, the client starts getting timeouts from cassandra, the sys
cpu
>>>>>> jumps a lot. All this, even though my row cache hit ratio is almost
>>>>>> 0.75.
>>>>>> 
>>>>>> Other than just turning off mmap completely, is there any other
>>>>>> solution or
>>>>>> setting to avoid a cassandra restart every cpl of days. Something
to
>>>>>> keep
>>>>>> the RES memory to hit such a high number. I have been constantly
>>>>>> monitoring
>>>>>> the RES, was not seeing issues when RES was at 14 gigs.
>>>>>> /G
>>>>>> 
>>>>>> On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh
>>>>>> <gurpreet.singh@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Aaron, Ruslan,
>>>>>>> I changed the disk access mode to mmap_index_only, and it has
been
>>>>>>> stable
>>>>>>> ever since, well at least for the past 20 hours. Previously,
in abt
>>>>>>> 10-12
>>>>>>> hours, as soon as the resident memory was full, the client would
start
>>>>>>> timing out on all its reads. It looks fine for now, i am going
to let
>>>>>>> it
>>>>>>> continue to see how long it lasts and if the problem comes again.
>>>>>>> 
>>>>>>> Aaron,
>>>>>>> yes, i had turned swap off.
>>>>>>> 
>>>>>>> The total cpu utilization was at 700% roughly.. It looked like
kswapd0
>>>>>>> was
>>>>>>> using just 1 cpu, but cassandra (jsvc) cpu utilization increased
quite
>>>>>>> a
>>>>>>> bit. top was reporting high system cpu, and low user cpu.
>>>>>>> vmstat was not showing swapping. java heap size max is 8 gigs.
while
>>>>>>> only
>>>>>>> 4 gigs was in use, so java heap was doing great. no gc in the
logs.
>>>>>>> iostat
>>>>>>> was doing ok from what i remember, i will have to reproduce the
issue
>>>>>>> for
>>>>>>> the exact numbers.
>>>>>>> 
>>>>>>> cfstats latency had gone very high, but that is partly due to
high cpu
>>>>>>> usage.
>>>>>>> 
>>>>>>> One thing was clear, that the SHR was inching higher (due to
the mmap)
>>>>>>> while buffer cache which started at abt 20-25mb reduced to 2
MB by the
>>>>>>> end,
>>>>>>> which probably means that pagecache was being evicted by the
kswapd0.
>>>>>>> Is
>>>>>>> there a way to fix the size of the buffer cache and not let system
>>>>>>> evict it
>>>>>>> in favour of mmap?
>>>>>>> 
>>>>>>> Also, mmapping data files would basically cause not only the
data
>>>>>>> (asked
>>>>>>> for) to be read into main memory, but also a bunch of extra pages
>>>>>>> (readahead), which would not be very useful, right? The same
thing for
>>>>>>> index
>>>>>>> would actually be more useful, as there would be more index entries
in
>>>>>>> the
>>>>>>> readahead part.. and the index files being small wouldnt cause
memory
>>>>>>> pressure that page cache would be evicted. mmapping the data
files
>>>>>>> would
>>>>>>> make sense if the data size is smaller than the RAM or the hot
data
>>>>>>> set is
>>>>>>> smaller than the RAM, otherwise just the index would probably
be a
>>>>>>> better
>>>>>>> thing to mmap, no?. In my case data size is 85 gigs, while available
>>>>>>> RAM is
>>>>>>> 16 gigs (only 8 gigs after heap).
>>>>>>> 
>>>>>>> /G
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton
>>>>>>> <aaron@thelastpickle.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Ruslan,
>>>>>>>> Why did you suggest changing the disk_access_mode ?
>>>>>>>> 
>>>>>>>> Gurpreet,
>>>>>>>> I would leave the disk_access_mode with the default until
you have a
>>>>>>>> reason to change it.
>>>>>>>> 
>>>>>>>>>> 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>>>>>> 
>>>>>>>> is swap disabled ?
>>>>>>>> 
>>>>>>>>> Gradually,
>>>>>>>>>> the system cpu becomes high almost 70%, and the client
starts
>>>>>>>>>> getting
>>>>>>>>>> continuous timeouts
>>>>>>>> 
>>>>>>>> 70% of one core or 70% of all cores ?
>>>>>>>> Check the server logs, is there GC activity ?
>>>>>>>> check nodetool cfstats to see the read latency for the cf.
>>>>>>>> 
>>>>>>>> Take a look at vmstat to see if you are swapping, and look
at iostats
>>>>>>>> to
>>>>>>>> see if io is the problem
>>>>>>>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> 
>>>>>>>> -----------------
>>>>>>>> Aaron Morton
>>>>>>>> Freelance Developer
>>>>>>>> @aaronmorton
>>>>>>>> http://www.thelastpickle.com
>>>>>>>> 
>>>>>>>> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote:
>>>>>>>> 
>>>>>>>> Thanks Ruslan.
>>>>>>>> I will try the mmap_index_only.
>>>>>>>> Is there any guideline as to when to leave it to auto and
when to use
>>>>>>>> mmap_index_only?
>>>>>>>> 
>>>>>>>> /G
>>>>>>>> 
>>>>>>>> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov
>>>>>>>> <ruslan.usifov@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> disk_access_mode: mmap??
>>>>>>>>> 
>>>>>>>>> set to disk_access_mode: mmap_index_only in cassandra
yaml
>>>>>>>>> 
>>>>>>>>> 2012/6/8 Gurpreet Singh <gurpreet.singh@gmail.com>:
>>>>>>>>>> Hi,
>>>>>>>>>> I am testing cassandra 1.1 on a 1 node cluster.
>>>>>>>>>> 8 core, 16 gb ram, 6 data disks raid0, no swap configured
>>>>>>>>>> 
>>>>>>>>>> cassandra 1.1.1
>>>>>>>>>> heap size: 8 gigs
>>>>>>>>>> key cache size in mb: 800 (used only 200mb till now)
>>>>>>>>>> memtable_total_space_in_mb : 2048
>>>>>>>>>> 
>>>>>>>>>> I am running a read workload.. about 30 reads/second.
no writes at
>>>>>>>>>> all.
>>>>>>>>>> The system runs fine for roughly 12 hours.
>>>>>>>>>> 
>>>>>>>>>> jconsole shows that my heap size has hardly touched
4 gigs.
>>>>>>>>>> top shows -
>>>>>>>>>>   SHR increasing slowly from 100 mb to 6.6 gigs in
 these 12 hrs
>>>>>>>>>>   RES increases slowly from 6 gigs all the way to
15 gigs
>>>>>>>>>>   buffers are at a healthy 25 mb at some point and
that goes down
>>>>>>>>>> to 2
>>>>>>>>>> mb in
>>>>>>>>>> these 12 hrs
>>>>>>>>>>   VIRT stays at 85 gigs
>>>>>>>>>> 
>>>>>>>>>> I understand that SHR goes up because of mmap, RES
goes up because
>>>>>>>>>> it
>>>>>>>>>> is
>>>>>>>>>> showing SHR value as well.
>>>>>>>>>> 
>>>>>>>>>> After around 10-12 hrs, the cpu utilization of the
system starts
>>>>>>>>>> increasing,
>>>>>>>>>> and i notice that kswapd0 process starts becoming
more active.
>>>>>>>>>> Gradually,
>>>>>>>>>> the system cpu becomes high almost 70%, and the client
starts
>>>>>>>>>> getting
>>>>>>>>>> continuous timeouts. The fact that the buffers went
down from 20
>>>>>>>>>> mb to
>>>>>>>>>> 2 mb
>>>>>>>>>> suggests that kswapd0 is probably swapping out the
pagecache.
>>>>>>>>>> 
>>>>>>>>>> Is there a way out of this to avoid the kswapd0 starting
to do
>>>>>>>>>> things
>>>>>>>>>> even
>>>>>>>>>> when there is no swap configured?
>>>>>>>>>> This is very easily reproducible for me, and would
like a way out
>>>>>>>>>> of
>>>>>>>>>> this
>>>>>>>>>> situation. Do i need to adjust vm memory management
stuff like
>>>>>>>>>> pagecache,
>>>>>>>>>> vfs_cache_pressure.. things like that?
>>>>>>>>>> 
>>>>>>>>>> just some extra information, jna is installed, mlockall
is
>>>>>>>>>> successful.
>>>>>>>>>> there
>>>>>>>>>> is no compaction running.
>>>>>>>>>> would appreciate any help on this.
>>>>>>>>>> Thanks
>>>>>>>>>> Gurpreet
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 


Mime
View raw message