hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6772) Make the Distributed Split HDFS Location aware
Date Wed, 06 Mar 2013 18:30:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594938#comment-13594938
] 

Jeffrey Zhong commented on HBASE-6772:
--------------------------------------

[~tedyu@apache.org] Thanks for commenting on this.

The sleepTime is for retrying purpose and only used when listChildrenAndWatchForNewChildren
hit errors or splitLogZNode doesn't exist. 

In normal case, the following line set a watch on splitLogZNode and returns. Zookeeper will
notify region servers to grab a task as soon as a new split task is saved into ZK.

{code}
        childrenPaths = ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
            this.watcher.splitLogZNode);
        if (childrenPaths != null) {
          return childrenPaths;
        }
{code} 
                
> Make the Distributed Split HDFS Location aware
> ----------------------------------------------
>
>                 Key: HBASE-6772
>                 URL: https://issues.apache.org/jira/browse/HBASE-6772
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: Jeffrey Zhong
>
> During a hlog split, each log file (a single hdfs block) is allocated to a different
region server. This region server reads the file and creates the recovery edit files.
> The allocation to the region server is random. We could take into account the locations
of the log file to split:
> - the reads would be local, hence faster. This allows short circuit as well.
> - less network i/o used during a failure (and this is important)
> - we would be sure to read from a working datanode, hence we're sure we won't have read
errors. Read errors slow the split process a lot, as we often enter the "timeouted world".

> We need to limit the calls to the namenode however.
> Typical algo could be:
> - the master gets the locations of the hlog files
> - it writes it into ZK, if possible in one transaction (this way all the tasks are visible
alltogether, allowing some arbitrage by the region server).
> - when the regionserver receives the event, it checks for all logs and all locations.
> - if there is a match, it takes it
> - if not it waits something like 0.2s (to give the time to other regionserver to take
it if the location matches), and take any remaining task.
> Drawbacks are:
> - a 0.2s delay added if there is no regionserver available on one of the locations. It's
likely possible to remove it with some extra synchronization.
> - Small increase in complexity and dependency to HDFS
> Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message