Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 16330 invoked from network); 8 Apr 2010 17:55:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Apr 2010 17:55:58 -0000 Received: (qmail 5769 invoked by uid 500); 8 Apr 2010 17:55:58 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 5744 invoked by uid 500); 8 Apr 2010 17:55:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 5736 invoked by uid 99); 8 Apr 2010 17:55:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Apr 2010 17:55:58 +0000 X-ASF-Spam-Status: No, hits=-1249.7 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Apr 2010 17:55:57 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BD355234C4B1 for ; Thu, 8 Apr 2010 17:55:36 +0000 (UTC) Message-ID: <874295223.17021270749336774.JavaMail.jira@brutus.apache.org> Date: Thu, 8 Apr 2010 17:55:36 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file In-Reply-To: <843484796.396631269216807264.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855050#action_12855050 ] Todd Lipcon commented on HDFS-1057: ----------------------------------- Solution 2 should work, but we still need to add to the recovery code to recompute the last chunk checksum during rbw recovery, right? (ie a DN could crash in between flushing data and metadata) > Concurrent readers hit ChecksumExceptions if following a writer to very end of file > ----------------------------------------------------------------------------------- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node > Affects Versions: 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Priority: Critical > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.