Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1A6CF200C1C for ; Wed, 15 Feb 2017 07:59:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1790B160B5E; Wed, 15 Feb 2017 06:59:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6280E160B46 for ; Wed, 15 Feb 2017 07:59:58 +0100 (CET) Received: (qmail 25316 invoked by uid 500); 15 Feb 2017 06:59:57 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 25304 invoked by uid 99); 15 Feb 2017 06:59:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2017 06:59:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C4C291A5F0E for ; Wed, 15 Feb 2017 06:59:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id OwGYdRlC2kHr for ; Wed, 15 Feb 2017 06:59:55 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 6B6CC5F4E5 for ; Wed, 15 Feb 2017 06:59:55 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B30E0E0485 for ; Wed, 15 Feb 2017 06:59:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B84252411D for ; Wed, 15 Feb 2017 06:59:41 +0000 (UTC) Date: Wed, 15 Feb 2017 06:59:41 +0000 (UTC) From: "Brahma Reddy Battula (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 15 Feb 2017 06:59:59 -0000 [ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867058#comment-15867058 ] Brahma Reddy Battula edited comment on HDFS-6804 at 2/15/17 6:58 AM: --------------------------------------------------------------------- Attaching the reproduce patch for this scenario.. was (Author: brahmareddy): Attaching the reproduce scenario for this scenario.. > race condition between transferring block and appending block causes "Unexpected checksum mismatch exception" > -------------------------------------------------------------------------------------------------------------- > > Key: HDFS-6804 > URL: https://issues.apache.org/jira/browse/HDFS-6804 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.2.0 > Reporter: Gordon Wang > Assignee: Wei-Chiu Chuang > Attachments: Testcase_append_transfer_block.patch > > > We found some error log in the datanode. like this > {noformat} > 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex > ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 > java.io.IOException: Terminating due to a checksum error.java.io.IOException: Unexpected checksum mismatch while writing BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from /192.168.2.101:39495 > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:744) > {noformat} > While on the source datanode, the log says the block is transmitted. > {noformat} > 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Da > taTransfer: Transmitted BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 > _9248 (numBytes=16188152) to /192.168.2.103:50010 > {noformat} > When the destination datanode gets the checksum mismatch, it reports bad block to NameNode and NameNode marks the replica on the source datanode as corrupt. But actually, the replica on the source datanode is valid. Because the replica can pass the checksum verification. > In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org