hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4755) HBase based block placement in DFS
Date Mon, 04 Mar 2013 21:21:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592651#comment-13592651
] 

nkeywal commented on HBASE-4755:
--------------------------------

I think it's quite good imho. I don't have much issues or warning, I think it's just gonna
work.

Some comments:
bq. Creation of table flow (assuming pre-split table)
All this is great imho. Taking into account the racks is quite important. AssignmentDomain
seems good.

bq.  2. The meta table is updated with the information about the location mapping for the
added regions.
I understand this as 'meta holds the favored nodes information'. It's fine for me.

bq. Failure recovery
The difficult point is to choose the third RS now: we've got one missing. Some comments:
-> We now have 2 RS on the same rack. So the config will be primary & secondary on
the same rask and tertiary on another (not ideal).
-> We can imagine a situation where the first RS will come back to life soon (rolling restart
for example).

bq. TODO: Handle non pre-split tables
>From a locality point of view, it's already an issue today: after the split, it seems
better to move them to different servers, but we just created files locally. I would tend
to think that it's the job a the next major compaction to clean all this.



We may have a first step in which we just go to the same servers for WAL & newly created
HFiles. Rationale:
- today the locality is achieved by triggering a major compaction, so this would remain ok.
- from a WAL point of view, with a 100 nodes cluster and 8 files per WAL, you will not recover
from a 5 nodes loss 5% of the time. If we go on the same servers we divide this probability
by 8.

And a second step could be to use this in the balance.
- if we lose a RS will have locality as well on the new node.

                
> HBase based block placement in DFS
> ----------------------------------
>
>                 Key: HBASE-4755
>                 URL: https://issues.apache.org/jira/browse/HBASE-4755
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.94.0
>            Reporter: Karthik Ranganathan
>            Assignee: Christopher Gist
>            Priority: Critical
>         Attachments: 4755-wip-1.patch, hbase-4755-notes.txt
>
>
> The feature as is only useful for HBase clusters that care about data locality on regionservers,
but this feature can also enable a lot of nice features down the road.
> The basic idea is as follows: instead of letting HDFS determine where to replicate data
(r=3) by place blocks on various regions, it is better to let HBase do so by providing hints
to HDFS through the DFS client. That way instead of replicating data at a blocks level, we
can replicate data at a per-region level (each region owned by a promary, a secondary and
a tertiary regionserver). This is better for 2 things:
> - Can make region failover faster on clusters which benefit from data affinity
> - On large clusters with random block placement policy, this helps reduce the probability
of data loss
> The algo is as follows:
> - Each region in META will have 3 columns which are the preferred regionservers for that
region (primary, secondary and tertiary)
> - Preferred assignment can be controlled by a config knob
> - Upon cluster start, HMaster will enter a mapping from each region to 3 regionservers
(random hash, could use current locality, etc)
> - The load balancer would assign out regions preferring region assignments to primary
over secondary over tertiary over any other node
> - Periodically (say weekly, configurable) the HMaster would run a locality checked and
make sure the map it has for region to regionservers is optimal.
> Down the road, this can be enhanced to control region placement in the following cases:
> - Mixed hardware SKU where some regionservers can hold fewer regions
> - Load balancing across tables where we dont want multiple regions of a table to get
assigned to the same regionservers
> - Multi-tenancy, where we can restrict the assignment of the regions of some table to
a subset of regionservers, so an abusive app cannot take down the whole HBase cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message