hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-6772) Make the Distributed Split HDFS Location aware
Date Thu, 13 Sep 2012 12:54:07 GMT
nkeywal created HBASE-6772:
------------------------------

             Summary: Make the Distributed Split HDFS Location aware
                 Key: HBASE-6772
                 URL: https://issues.apache.org/jira/browse/HBASE-6772
             Project: HBase
          Issue Type: Improvement
          Components: master, regionserver
    Affects Versions: 0.96.0
            Reporter: nkeywal


During a hlog split, each log file (a single hdfs block) is allocated to a different region
server. This region server reads the file and creates the recovery edit files.
The allocation to the region server is random. We could take into account the locations of
the log file to split:
- the reads would be local, hence faster. This allows short circuit as well.
- less network i/o used during a failure (and this is important)
- we would be sure to read from a working datanode, hence we're sure we won't have read errors.
Read errors slow the split process a lot, as we often enter the "timeouted world". 

We need to limit the calls to the namenode however.

Typical algo could be:
- the master gets the locations of the hlog files
- it writes it into ZK, if possible in one transaction (this way all the tasks are visible
alltogether, allowing some arbitrage by the region server).
- when the regionserver receives the event, it checks for all logs and all locations.
- if there is a match, it takes it
- if not it waits something like 0.2s (to give the time to other regionserver to take it if
the location matches), and take any remaining task.

Drawbacks are:
- a 0.2s delay added if there is no regionserver available on one of the locations. It's likely
possible to remove it with some extra synchronization.
- Small increase in complexity and dependency to HDFS

Considering the advantages, it's worth it imho.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message