Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of maheswara@huawei.com
 designates 206.16.17.211 as permitted sender)
Date: Tue, 09 Aug 2011 19:08:27 +0500
From: Uma Maheswara Rao G 72686 <maheswara@huawei.com>
Subject: Re: FW: Handling read failures during recovery
In-reply-to: <F9E888918C7B4228B72F5C9D1E9389B7@china.huawei.com>
To: hdfs-dev@hadoop.apache.org, dev@hbase.apache.org
Cc: ramakrishnas@huawei.com
Message-id: <fe8ae347206d.206dfe8ae347@huawei.com>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-language: en
Content-transfer-encoding: 7BIT
Content-disposition: inline
Priority: normal
References: <F9E888918C7B4228B72F5C9D1E9389B7@china.huawei.com>


Hi All,

Any thoughts?

 Looks Hbase is going to address this issue.  https://issues.apache.org/jira/browse/HBASE-4177.

Do we need to address from HDFS as well?

If read request comes before completing client recovery process, do we need to make the read operation wait until recovery completes successfully?


Regards,
Uma
> -----Original Message-----
> From: Ramkrishna S Vasudevan [mailto:ramakrishnas@huawei.com] 
> Sent: Friday, August 05, 2011 9:52 AM
> To: hdfs-dev@hadoop.apache.org; dev@hbase.apache.org
> Subject: RE: Handling read failures during recovery
> 
> Hi 
> 
> As Laxman pointed out, there is a potential problem here.  We 
> expect the
> Namenode recovery to happen within a specified time and we tend to 
> sleep for
> one second in the splitLogs logic.  But we carry on with reading 
> the HLog
> file which will result in failure.  So if the logs are not split 
> properlythere could be a data loss.
> 
> 
> 
> Regards
> Ram
> 
> 
> 
> -----Original Message-----
> From: Laxman [mailto:lakshman_ch@huawei.com] 
> Sent: Tuesday, August 02, 2011 10:47 AM
> To: hdfs-dev@hadoop.apache.org; dev@hbase.apache.org
> Subject: FW: Handling read failures during recovery
> 
> Partial mail was sent accidentally. Sorry for that.
> Resending with complete details, analysis and logs.
> 
> 20-append version we are using.
> 
> To summarize there are two problems [One each from HDFS and HBase] we
> noticed in this flow.
> 
> 
> 1) From HDFS
> Even though client is getting the updated block info from Namenode 
> on first
> read failure, client is discarding the new info and using the old 
> info only
> to retrieve the data from datanode. So, all the read 
> retries are failing. [Method parameter reassignment - Not 
> reflected in
> caller]
> 
> 
> HDFS Code snippet
> org.apache.hadoop.hdfs.DFSClient.DFSInputStream.chooseDataNode 
> 
> private DNAddrPair chooseDataNode(LocatedBlock block) 
>      throws IOException {
> ...
> ...
> block = getBlockAt(block.getStartOffset(), false);
> ...
> ...
> }
> 
> Here method parameter "block" is assigned with the new block info 
> which is
> not reflected in the caller "blockSeekTo(long target)".
> 
> 2) From HBase
> 
> Excerpt from my previous mail.
> 
> > As the recovery is an asynchronous operation recoverLease call 
> will return
> > immediately and may end up with read failure as the recovery is in
> progress.
> > 
> > This may lead to some regions to be in offline state only
> 
> > One approach is to introduce a delay in between recovery and 
> read. But,
> this
> > may not be a fool proof way to address this.
> 
> I've noticed the delay is already present in HBase code. But as I 
> mentionedthis may not be a fool proof mechanism to handle this 
> scenario.
> HBase Code snippet
> In the class HLogSplitter the splitLog() calls recoverFileLease(). 
> 
> In recoverFileLease() 
> 
>      try { 
>        Thread.sleep(1000); 
>      } catch (InterruptedException ex) { 
>        new InterruptedIOException().initCause(ex); 
>      } 
> 
> Once the recover call is made we sleep for one sec and proceed with
> parseHLog().
> 
> 
> Here is the log
> 2011-07-21 17:01:19,642 INFO org.apache.hadoop.hdfs.DFSClient: 
> Could not
> obtain block blk_1311262402613_3094 from any node: 
> java.io.IOException: No
> live nodes contain current block. Will get new block locations 
> from namenode
> and retry...
> 2011-07-21 17:01:20,650 INFO org.apache.hadoop.hdfs.DFSClient: 
> Could not
> obtain block blk_1311262402613_3094 from any node: 
> java.io.IOException: No
> live nodes contain current block. Will get new block locations 
> from namenode
> and retry...
> 2011-07-21 17:01:21,669 INFO org.apache.hadoop.hdfs.DFSClient: 
> Could not
> obtain block blk_1311262402613_3094 from any node: 
> java.io.IOException: No
> live nodes contain current block. Will get new block locations 
> from namenode
> and retry...
> 2011-07-21 17:01:22,677 WARN org.apache.hadoop.hdfs.DFSClient: DFS 
> Read:java.io.IOException: Could not obtain block: 
> blk_1311262402613_3318file=/hbase/.logs/158-1-101-
> 222,20020,1311260346420/158-1-101-222%3A20020.13
> 11265398432
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
> a:2491)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2
> 256)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2441)at java.io.DataInputStream.read(DataInputStream.java:132)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
> at 
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1984)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1884)at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(Sequence
> FileLogReader.java:198)
> at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(Sequence
> FileLogReader.java:172)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter
> .java:429)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.
> java:262)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.
> java:188)
> at
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.ja
> va:201)
> 
> 
> -----Original Message-----
> From: Stack [mailto:saint.ack@gmail.com] 
> Sent: Monday, August 01, 2011 9:03 PM
> To: dev@hbase.apache.org
> Cc: dev@hbase.apache.org
> Subject: Re: Handling read failures during recovery
> 
> Which hdfs version and what is the error u see?  Thanks.
> 
> Stack
> 
> 
> 
> On Aug 1, 2011, at 4:33, Laxman <lakshman_ch@huawei.com> wrote:
> 
> > Hi Everyone,
> > 
> > 
> > 
> > In HBase we try to recover the HLog file and then immediately 
> proceed with
> > read operation.
> > 
> > As the recovery is an asynchronous operation recoverLease call 
> will return
> > immediately and may end up with read failure as the recovery is in
> progress.
> > 
> > This may lead to some regions to be in offline state only.
> > 
> > 
> > 
> > One approach is to introduce a delay in between recovery and 
> read. But,
> this
> > may not be a fool proof way to address this.
> > 
> > 
> > 
> > How do we handle this scenario? 
> > 
> > 
> > 
> > Please do correct me if my understanding went wrong.
> > 
> > --Laxman
> > 
> 
>