hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4755) HBase based block placement in DFS
Date Tue, 19 Feb 2013 21:27:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581646#comment-13581646
] 

Enis Soztutar commented on HBASE-4755:
--------------------------------------

bq. 2) The next step is to have the creation of store files honor this region placement.
Is this patch useful without giving hints to DFSClient about block placement. I though we
still don't have the pluming in hadoop yet.  
                
> HBase based block placement in DFS
> ----------------------------------
>
>                 Key: HBASE-4755
>                 URL: https://issues.apache.org/jira/browse/HBASE-4755
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.0
>            Reporter: Karthik Ranganathan
>            Assignee: Christopher Gist
>         Attachments: 4755-wip-1.patch
>
>
> The feature as is only useful for HBase clusters that care about data locality on regionservers,
but this feature can also enable a lot of nice features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to replicate data
(r=3) by place blocks on various regions, it is better to let HBase do so by providing hints
to HDFS through the DFS client. That way instead of replicating data at a blocks level, we
can replicate data at a per-region level (each region owned by a promary, a secondary and
a tertiary regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the probability
of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred regionservers for that
region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 regionservers
(random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to primary
over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality checked and
make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table to get
assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some table to
a subset of regionservers, so an abusive app cannot take down the whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message