Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 99194 invoked from network); 24 Jun 2010 23:56:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jun 2010 23:56:12 -0000 Received: (qmail 15618 invoked by uid 500); 24 Jun 2010 23:56:12 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 15550 invoked by uid 500); 24 Jun 2010 23:56:11 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 15542 invoked by uid 99); 24 Jun 2010 23:56:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 23:56:11 +0000 X-ASF-Spam-Status: No, hits=-1543.6 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 23:56:11 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o5ONtoXr021211 for ; Thu, 24 Jun 2010 23:55:51 GMT Message-ID: <28410020.49931277423750875.JavaMail.jira@thor> Date: Thu, 24 Jun 2010 19:55:50 -0400 (EDT) From: "sam rash (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1186) 0.20: DNs should interrupt writers at start of recovery In-Reply-To: <8074260.166471275595439491.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882401#action_12882401 ] sam rash commented on HDFS-1186: -------------------------------- hmm i wonder why only 1? if the client thinks there are 3 DNs in the pipeline and asks to recovery 3, i think it should fail with less than 3. a client can request fewer if that works (in which case we do have to handle the problem you lay out) so in your sol'n, you are saying that the lease holder, the client, needs to be contacted to verify the primary is the only one doing lease recovery? (or at least the latest) > 0.20: DNs should interrupt writers at start of recovery > ------------------------------------------------------- > > Key: HDFS-1186 > URL: https://issues.apache.org/jira/browse/HDFS-1186 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20-append > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Blocker > Attachments: hdfs-1186.txt > > > When block recovery starts (eg due to NN recovering lease) it needs to interrupt any writers currently writing to those blocks. Otherwise, an old writer (who hasn't realized he lost his lease) can continue to write+sync to the blocks, and thus recovery ends up truncating data that has been sync()ed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.