From hdfs-issues-return-15249-apmail-hadoop-hdfs-issues-archive=hadoop.apache.org@hadoop.apache.org Tue Mar 01 19:49:01 2011 Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 31421 invoked from network); 1 Mar 2011 19:49:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Mar 2011 19:49:01 -0000 Received: (qmail 49992 invoked by uid 500); 1 Mar 2011 19:49:00 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 49664 invoked by uid 500); 1 Mar 2011 19:49:00 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 49566 invoked by uid 99); 1 Mar 2011 19:49:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 19:49:00 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 19:48:58 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 356F74ABCD for ; Tue, 1 Mar 2011 19:48:37 +0000 (UTC) Date: Tue, 1 Mar 2011 19:48:37 +0000 (UTC) From: "Tsz Wo (Nicholas), SZE (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <904030761.5895.1299008917215.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001062#comment-13001062 ] Tsz Wo (Nicholas), SZE commented on HDFS-1057: ---------------------------------------------- Hi Sam, TestFileConcurrentReader fails again, created HDFS-1679. It would be great if you could take a look. > Concurrent readers hit ChecksumExceptions if following a writer to very end of file > ----------------------------------------------------------------------------------- > > Key: HDFS-1057 > URL: https://issues.apache.org/jira/browse/HDFS-1057 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node > Affects Versions: 0.20-append, 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Assignee: sam rash > Priority: Blocker > Fix For: 0.20-append, 0.21.0, 0.22.0 > > Attachments: HDFS-1057-0.20-append.patch, conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt > > > In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira