Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCD6C78FE for ; Sat, 10 Sep 2011 16:52:31 +0000 (UTC) Received: (qmail 96778 invoked by uid 500); 10 Sep 2011 16:52:31 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 96715 invoked by uid 500); 10 Sep 2011 16:52:31 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 96706 invoked by uid 99); 10 Sep 2011 16:52:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Sep 2011 16:52:31 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Sep 2011 16:52:29 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 92BEA8EDFE for ; Sat, 10 Sep 2011 16:52:09 +0000 (UTC) Date: Sat, 10 Sep 2011 16:52:09 +0000 (UTC) From: "Uma Maheswara Rao G (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <30076953.13275.1315673529597.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1852146808.3829.1314638077753.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2296) If read error while lease is being recovered, client reverts to stale view on block info MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102081#comment-13102081 ] Uma Maheswara Rao G commented on HDFS-2296: ------------------------------------------- . Thanks Stack, for filing the JIRA. Do we need to make all read operations wait till the recovery completes? (or) Lets provide an API from DFS to tell any recovery is in progress.So that the Apps can make use of that API to know whether the given recovery request is completed or not. Any thoughts? Todd & Jitendra, Any suggestions? Can you please give your opinions? Thanks Uma > If read error while lease is being recovered, client reverts to stale view on block info > ---------------------------------------------------------------------------------------- > > Key: HDFS-2296 > URL: https://issues.apache.org/jira/browse/HDFS-2296 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.20-append, 0.22.0, 0.23.0 > Reporter: stack > Priority: Critical > > We are seeing the following issue around recoverLease over in hbaselandia. DFSClient calls recoverLease to assume ownership of a file. The recoverLease returns to the client but it can take time for the new state to propagate. Meantime, an incoming read fails though its using updated block info. Thereafter all read retries fail because on exception we revert to stale block view and we never recover. Laxman reports this issue in the below mailing thread: > See this thread for first report of this issue: http://search-hadoop.com/m/S1mOHFRmgk2/%2527FW%253A+Handling+read+failures+during+recovery%2527&subj=FW+Handling+read+failures+during+recovery > Chatting w/ Hairong offline, she suggests this a general issue around lease recovery no matter how it triggered (new recoverLease or not). > I marked this critical. At least over in hbase it is since we get set stuck here recovering a crashed server. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira