hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-8389) HBase 8354 DDoSes Namenode with lease recovery requests
Date Sun, 21 Apr 2013 17:35:15 GMT
Varun Sharma created HBASE-8389:

             Summary: HBase 8354 DDoSes Namenode with lease recovery requests
                 Key: HBASE-8389
                 URL: https://issues.apache.org/jira/browse/HBASE-8389
             Project: HBase
          Issue Type: Improvement
         Environment: We ran hbase 0.94.3 patched with 8354 and observed too many outstanding
lease recoveries because of the short retry interval of 1 second between lease recoveries.

The namenode gets into the following loop:
1) Receives lease recovery request and initiates recovery choosing a primary datanode every
2) A lease recovery is successful and the namenode tries to commit the block under recovery
as finalized - this takes < 10 seconds in our environment since we run with tight HDFS
socket timeouts.
3) At step 2), there is a more recent recovery enqueued because of the aggressive retries.
This causes the committed block to get preempted and we enter a vicious cycle

So we do,  <initiate_recovery> --> <commit_block> --> <commit_preempted_by_another_recovery>

This loop is paused after 300 seconds which is the "hbase.lease.recovery.timeout". Hence the
MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds
and HDFS stale node detection timeout is 20 seconds.

Note that before the patch, we do not call recoverLease so aggressively - also it seems that
the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call.
Before the patch, we call recoverLease, assume that the block was recovered, try to get the
file, it has zero length since its under recovery, we fail the task and retry until we get
a non zero length. So things just work.

1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous
timeout. The timeout needs to be configurable since typically, the recovery takes as much
time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks
and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as
the HDFS timeouts.

2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to
perform the recovery (since its still alive). Hence the first recovery request is bound to
fail. So if we want a tight MTTR, we either need something like HDFS 4721 or we need something
like this


Where configuredTimeout should be large enough to let the recovery happen but the first timeout
is short so that we get past the moot recovery in step #1.

            Reporter: Varun Sharma
            Assignee: Varun Sharma
             Fix For: 0.94.8

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message