cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: Use of posix_fadvise
Date Tue, 18 Oct 2016 16:34:06 GMT
Although given we have an in process page cache[1] now this may not be
needed anymore?
This is only for the data file though.  I think its been years? since we
showed it helped so perhaps someone should show if this is still
working/helping in the real world.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5863


On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
mkjellman@internalcircle.com> wrote:

> Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."
>
> On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
> mkjellman@internalcircle.com<mailto:mkjellman@internalcircle.com>> wrote:
>
> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> On Oct 18, 2016, at 8:50 AM, Jake Luciani <jakers@gmail.com<mailto:jaker
> s@gmail.com>> wrote:
>
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
>
> On Oct 18, 2016 11:29 AM, "Michael Kjellman" <mkjellman@internalcircle.com
> <mailto:mkjellman@internalcircle.com>>
> wrote:
>
> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone
>
>
>


-- 
http://twitter.com/tjake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message