hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7714) Add support in native libs for OS buffer cache management
Date Thu, 06 Oct 2011 21:19:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122309#comment-13122309
] 

Todd Lipcon commented on HADOOP-7714:
-------------------------------------

I elected to use sync_data_range instead of fdatasync, since I think fdatasync will enqueue
the writes as synchronous operations in the block device's IO queue, causing them to get pushed
to the front. sync_data_range just triggers the dirty page writeback path, which is in the
async queue. This means the IO scheduler has more time to reorder them, etc, since no one
is waiting on the result. I agree fdatasync would give you better cache cleanliness, but I
imagine it will hurt performance a bit.
                
> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>
>                 Key: HADOOP-7714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7714
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: graphs.pdf, hadoop-7714-20s-prelim.txt
>
>
> Especially in shared HBase/MR situations, management of the OS buffer cache is important.
Currently, running a big MR job will evict all of HBase's hot data from cache, causing HBase
performance to really suffer. However, caching of the MR input/output is rarely useful, since
the datasets tend to be larger than cache and not re-read often enough that the cache is used.
Having access to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms where
they are supported would allow us to do a better job of managing this cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message