hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-57) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness
Date Sun, 29 Mar 2009 21:47:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693621#action_12693621

stack commented on HBASE-57:

The going direct to local blocks reading is HADOOP-4801.  In summary, the payoff short-circuiting
the datanode is small, and yet to be seen -- at least to date -- and it seems doubtful that
a second route to the data will be opened because of security concerns, etc.  Thats my take
on the issue (It could change of course).

I think that if we only made savings in network traffic, that'd be reason enough to implement
locality algorithms.  JK makes an interesting point above that we could manufacture hot datanodes
if we blindly serve regions from a datanode that hosts all the data but this can happen now
since we operate blindly and its only smart use of the locality info that will help damp hot

Samuel, if still interested, have you made petition to become a GSOC student using this issue
as your project?  (Add in some of JKs notes on need to research what happens in a running
cluster so know best what to implement).

> [hbase] Master should allocate regions to regionservers based upon data locality and
rack awareness
> ---------------------------------------------------------------------------------------------------
>                 Key: HBASE-57
>                 URL: https://issues.apache.org/jira/browse/HBASE-57
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.2.0
>            Reporter: stack
>             Fix For: 0.20.0
> Currently, regions are assigned regionservers based off a basic loading attribute.  A
factor to include in the assignment calcuation is the location of the region in hdfs; i.e.
servers hosting region replicas.  If the cluster is such that regionservers are being run
on the same nodes as those running hdfs, then ideally the regionserver for a particular region
should be running on the same server as hosts a region replica.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message