Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69DB99ADF for ; Fri, 27 Jan 2012 20:50:35 +0000 (UTC) Received: (qmail 37214 invoked by uid 500); 27 Jan 2012 20:50:35 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 37147 invoked by uid 500); 27 Jan 2012 20:50:34 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 37135 invoked by uid 99); 27 Jan 2012 20:50:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jan 2012 20:50:34 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=5.0 tests=ALL_TRUSTED,FS_REPLICA,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jan 2012 20:50:32 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 40633166120 for ; Fri, 27 Jan 2012 20:50:11 +0000 (UTC) Date: Fri, 27 Jan 2012 20:50:11 +0000 (UTC) From: "Todd Lipcon (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <415292433.1764.1327697411265.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1985997259.41976.1326603039586.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2791) If block report races with closing of file, replica is incorrectly marked corrupt MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195086#comment-13195086 ] Todd Lipcon commented on HDFS-2791: ----------------------------------- bq. I am coming to the conclusion that when a NN asks a DN to delete a replica, in addition to the bid and generation stamp, it should also include the state (RBW etc) known to the NN. The block is deleted only if the it is in that state. Good idea - I like this safeguard. But given that there are +1s on this patch here, I dont think the above safeguard is mutually exclusive either. So let's do both for extra safety. Assuming this patch still applies, I'll commit it momentarily. > If block report races with closing of file, replica is incorrectly marked corrupt > --------------------------------------------------------------------------------- > > Key: HDFS-2791 > URL: https://issues.apache.org/jira/browse/HDFS-2791 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, name-node > Affects Versions: 0.22.0, 0.23.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-2791-test.txt, hdfs-2791.txt, hdfs-2791.txt, hdfs-2791.txt, hdfs-2791.txt > > > The following sequence of events results in a replica mistakenly marked corrupt: > 1. Pipeline is open with 2 replicas > 2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network). It gets "stuck" right before the block report RPC. > 3. Client closes the file. > 4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE > 5. DN1's block report proceeds, and includes the block in an RBW state. > 6. (x) NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira