hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7714) Add support in native libs for OS buffer cache management
Date Thu, 06 Oct 2011 03:50:29 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121705#comment-13121705

Todd Lipcon commented on HADOOP-7714:

Thanks for the comments. I'm doing some more refactoring of the readahead code, and integrating
it into parts of the shuffle as well as an experiment. Running a terasort, my current iteration
of the patch slowed down the map phase by a few percent, but really improved speed of the
reduce phase (I'm running untuned settings so the reduce phase has lots of IFile merges, which
I've plugged readahead into). The good news is the total runtime for my terasort went from
40m13s to 33m7s (near 20% speedup). The reduce phase (counting from when mapper output read
100% on my console) went from about 19 minutes down to 9 minutes.

It might be that the map phase slowed down due to the bug you found above. Let me look into
that and run some more experiments - then I'll post another patch.
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>                 Key: HADOOP-7714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7714
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-7714-20s-prelim.txt
> Especially in shared HBase/MR situations, management of the OS buffer cache is important.
Currently, running a big MR job will evict all of HBase's hot data from cache, causing HBase
performance to really suffer. However, caching of the MR input/output is rarely useful, since
the datasets tend to be larger than cache and not re-read often enough that the cache is used.
Having access to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms where
they are supported would allow us to do a better job of managing this cache.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message