hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7714) Add support in native libs for OS buffer cache management
Date Fri, 07 Oct 2011 08:32:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122616#comment-13122616

Scott Carey commented on HADOOP-7714:

{quote}I think the issue is that Linux's native readahead is not very aggressive,

I have been tuning my systems for quite a while with aggressive OS readahead.  The default
is 128K, but it can be upped significantly which helps quite a bit on sequential reads to
SATA drives.  Additionally, the 'deadline' scheduler is better at sequential throughput under
contention.  I wonder how much of your manual read-ahead is just compensating for the poor
OS defaults?  In other applications, I maximized read speeds (and reduced CPU use) by using
small read buffers in Java (32KB) and large Linux read-ahead settings.

Additionaly, I always set up a separate file system for M/R temp space away from HDFS.  The
HDFS one is tuned for sequential reads and fast flush from OS buffers to disk, with the deadline
scheduler.  The temp space is tuned to delay flush to disk for up to 60 seconds (small jobs
don't even make it to disk this way), and uses the CFQ scheduler.

This combination reduced the time of many of our jobs significantly (CDH2 and CDH3) -- especially
job chains with many small tasks mixed in.

The Linux tuning parameters that have a big effect on disk performance and pagecache behavior
readahead (e.g. blockdev --setra 4096 /dev/sda)
ext4 also has inode_readahead_blks=n and commit=nrsec

> Add support in native libs for OS buffer cache management
> ---------------------------------------------------------
>                 Key: HADOOP-7714
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7714
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: graphs.pdf, hadoop-7714-2.txt, hadoop-7714-20s-prelim.txt
> Especially in shared HBase/MR situations, management of the OS buffer cache is important.
Currently, running a big MR job will evict all of HBase's hot data from cache, causing HBase
performance to really suffer. However, caching of the MR input/output is rarely useful, since
the datasets tend to be larger than cache and not re-read often enough that the cache is used.
Having access to the native calls {{posix_fadvise}} and {{sync_data_range}} on platforms where
they are supported would allow us to do a better job of managing this cache.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message