hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cristina L. Abad (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7714) Add support in native libs for OS buffer cache management
Date Thu, 06 Oct 2011 02:36:29 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121682#comment-13121682
] 

Cristina L. Abad commented on HADOOP-7714:
------------------------------------------

I got started with some testing today and can definitely see the effect of the fadvise on
the page cache, however, in my tests I was still seeing about 8MB in the page cache belonging
to a 64MB block for which fadvise calls were being issued (I checked this with strace). I
looked into the BlockSender code and there seems to be an error in the fadvise call:

NativeIO.posixFadviseIfPossible(blockInFd, lastCacheDropOffset, offset - 1024, NativeIO.POSIX_FADV_DONTNEED);

should be

NativeIO.posixFadviseIfPossible(blockInFd, lastCacheDropOffset, offset - lastCacheDropOffset,
NativeIO.POSIX_FADV_DONTNEED);

I am not sure what the "- 1024" is for, but in any case that parameter should be the length
instead of an offset. Having said that, this is not what is causing the 8MB to stay in the
page cache. I tried changing the fadvise call to:

NativeIO.posixFadviseIfPossible(blockInFd, 0, offset, NativeIO.POSIX_FADV_DONTNEED); // Yes,
I know, this is redundant since it is being called frequently

and that reduced the number of pages in the cache from 8MB to a varying value between 0-2MB.

I looked into what pages were remaining in the cache and it seems that a few random pages
are staying every time, plus some pages at the end.

Nathan (Roberts) and I looked through this issue and thought the kernel's read ahead mechanism
may be preventing some pages from being removed from the page cache so we changed the POSIX_FADV_SEQUENTIAL
to POSIX_FADV_RANDOM and that seemed to make things better in the sense that now I am only
seeing very little pages staying around (around 4-8 4KB pages in the latests tests I ran today).
Having said that, POSIX_FADV_SEQUENTIAL is of course what we should be using. Any ideas on
how to make all the fadvised (DONT_NEED) pages to go away? We are puzzled on why those last
few pages seem to hang around, specially since I modified the fadvise calls to go from offset
0 every time; in other words, we are repeatedly telling the kernel to remove those pages and
still a few manage to stay around. I'll keep looking into this issue and will try to get some
performance numbers but would love to have the code working as expected before doing the tests.

BTW, I did not look into the BlockReceiver code; once BlockSender is working as expected I'll
look into it.

I hope that what I wrote makes sense; if something is not clear I'll be happy to explain the
issue in more detail.
                
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>
>                 Key: HADOOP-7714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7714
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache is important.
Currently, running a big MR job will evict all of HBase's hot data from cache, causing HBase
performance to really suffer. However, caching of the MR input/output is rarely useful, since
the datasets tend to be larger than cache and not re-read often enough that the cache is used.
Having access to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms where
they are supported would allow us to do a better job of managing this cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message