hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4177) Handling read failures during recovery‏ - when HMaster calls Namenode recovery, recovery may be a failure leading to read failure while splitting logs
Date Mon, 08 Aug 2011 18:14:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081098#comment-13081098
] 

Ted Yu commented on HBASE-4177:
-------------------------------

Looking at FSUtils.recoverFileLease(), we check the type of fs inside while loop. This is
unnecessary.

w.r.t. soft limit for the lease, we have:
{code}
          if (waitedFor > FSConstants.LEASE_SOFTLIMIT_PERIOD) {
            LOG.warn("Waited " + waitedFor + "ms for lease recovery on " + p +
              ":" + e.getMessage());
          }
{code}
I think we should wait for the remainder of soft limit (which is 60 seconds).


> Handling read failures during recovery‏ - when HMaster calls Namenode recovery, recovery
may be a failure leading to read failure while splitting logs
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4177
>                 URL: https://issues.apache.org/jira/browse/HBASE-4177
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>
> As per the mailing thread with the heading
> 'Handling read failures during recovery‏' we found this problem.
> As part of split Logs the HMaster calls Namenode recovery.  The recovery is an asynchronous
process. 
> In HDFS
> =======
> Even though client is getting the updated block info from Namenode on first
> read failure, client is discarding the new info and using the old info only
> to retrieve the data from datanode. So, all the read
> retries are failing. [Method parameter reassignment - Not reflected in
> caller]. 
> In HBASE
> =======
> In HMaster code we tend to wait for  1sec.  But if the recovery had some failure then
split log may not happen and may lead to dataloss.
> So may be we need to decide upon the actual delay that needs to be introduced once Hmaster
calls NN recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message