hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6572) Tiered HFile storage
Date Sun, 11 May 2014 11:04:17 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell updated HBASE-6572:

    Assignee:     (was: Andrew Purtell)

> Tiered HFile storage
> --------------------
>                 Key: HBASE-6572
>                 URL: https://issues.apache.org/jira/browse/HBASE-6572
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Andrew Purtell
> Consider how we might enable tiered HFile storage. If HDFS has the capability, we could
create certain files on solid state devices where they might be frequently accessed, especially
for random reads; and others (and by default) on spinning media as before. We could support
the move of frequently read HFiles from spinning media to solid state. We already have CF
statistics for this, would only need to add requisite admin interface; could even consider
an autotiering option. 
> Dhruba Borthakur did some early work in this area and wrote up his findings: http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html
. It is important to note the findings but I suggest most of the recommendations are out of
scope of this JIRA. This JIRA seeks to find an initial use case that produces a reasonable
benefit, and serves as a testbed for further improvements. If I may paraphrase Dhruba's findings
(any misstatements and errors are mine): First, the DFSClient code paths introduce significant
latency, so the HDFS client (and presumably the DataNode, as the next bottleneck) will need
significant work to knock that down. Need to investigate optimized (perhaps read-only) DFS
clients, server side read and caching strategies. Second, RegionServers are heavily threaded
and this imposes a lot of monitor contention and context switching cost. Need to investigate
reducing the number of threads in a RegionServer, nonblocking IO and RPC.

This message was sent by Atlassian JIRA

View raw message