hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10689) Explore advisory caching for MR over snapshot scans
Date Mon, 10 Mar 2014 01:44:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925386#comment-13925386

Colin Patrick McCabe commented on HBASE-10689:

[~stack], there are multiple kinds of caching in HDFS.  The path-based caching added in HDFS-4949
caches at the file level, so you are right that it is not that useful for HBase.  The advisory
caching API is a little different.  It allows the application to control how much readahead
HDFS does and a little bit about how the page cache is used.

When HBase reads a 64kb chunk, currently HDFS will load a 4MB segment off of the disk.  The
rest of that 4MB is thrown away unless HBase uses it.  HBase could avoid this issue by calling
DFSInputStream#setReadahead(65536).  Unless HBase is doing something smart with the rest of
that 4MB, it seems like this might be a good idea?

> Explore advisory caching for MR over snapshot scans
> ---------------------------------------------------
>                 Key: HBASE-10689
>                 URL: https://issues.apache.org/jira/browse/HBASE-10689
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, Performance
>            Reporter: Nick Dimiduk
> Per [comment|https://issues.apache.org/jira/browse/HBASE-10660?focusedCommentId=13921730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13921730]
on HBASE-10660, explore using the new HDFS advisory caching feature introduced in HDFS-4817
for TableSnapshotInputFormat.

This message was sent by Atlassian JIRA

View raw message