Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2059B200C7C for ; Mon, 5 Jun 2017 21:10:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1F4DA160BD4; Mon, 5 Jun 2017 19:10:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3F03B160BBB for ; Mon, 5 Jun 2017 21:10:09 +0200 (CEST) Received: (qmail 34800 invoked by uid 500); 5 Jun 2017 19:10:08 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 34789 invoked by uid 99); 5 Jun 2017 19:10:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jun 2017 19:10:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id EFE35C0310 for ; Mon, 5 Jun 2017 19:10:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 17_rDctwpGZj for ; Mon, 5 Jun 2017 19:10:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id AFC5A5FC84 for ; Mon, 5 Jun 2017 19:10:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E142DE0C15 for ; Mon, 5 Jun 2017 19:10:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 44B1D21E10 for ; Mon, 5 Jun 2017 19:10:04 +0000 (UTC) Date: Mon, 5 Jun 2017 19:10:04 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11472) Fix inconsistent replica size after a data pipeline failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 05 Jun 2017 19:10:10 -0000 [ https://issues.apache.org/jira/browse/HDFS-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037419#comment-16037419 ] Wei-Chiu Chuang commented on HDFS-11472: ---------------------------------------- Hi [~xkrogen]. Thanks for the comment. I think there's no harm adding the extra warning, as it might still be possible a similar error creeps into it even after this fix. I am not so sure about resetting the replica's in-memory last chunk checksum. After the recovery initiated by {{initReplicaRecoveryImpl}}, the block may be read, and if the LCC does not match the data in the last chunk, the reader would erroneously believe the block is corrupt, which defeats the purpose of the fix. The reason that {{recoverRbwImpl}} resets the replica's in-memory LCC to null, is that after the recovery, the block is immediately being written, so the LCC won't match the chunk data anyway (after the block is finalized, LCC is updated), and there is little benefit in making the LCC correct. > Fix inconsistent replica size after a data pipeline failure > ----------------------------------------------------------- > > Key: HDFS-11472 > URL: https://issues.apache.org/jira/browse/HDFS-11472 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Priority: Critical > Labels: release-blocker > Attachments: HDFS-11472.001.patch, HDFS-11472.002.patch, HDFS-11472.003.patch, HDFS-11472.testcase.patch > > > We observed a case where a replica's on disk length is less than acknowledged length, breaking the assumption in recovery code. > {noformat} > 2017-01-08 01:41:03,532 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to obtain replica info for block (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null]) > java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() < getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW > getNumBytes() = 27530 > getBytesOnDisk() = 27006 > getVisibleLength()= 27268 > getVolume() = /data/6/hdfs/datanode/current > getBlockFile() = /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952 > bytesAcked=27268 > bytesOnDisk=27006 > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284) > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260) > at org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566) > at org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577) > at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645) > at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245) > at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551) > at java.lang.Thread.run(Thread.java:745) > {noformat} > It turns out that if an exception is thrown within {{BlockReceiver#receivePacket}}, the in-memory replica on disk length may not be updated, but the data is written to disk anyway. > For example, here's one exception we observed > {noformat} > 2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067 > java.nio.channels.ClosedByInterruptException > at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269) > at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670) > at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There are potentially other places and causes where an exception is thrown within {{BlockReceiver#receivePacket}}, so it may not make much sense to alleviate it for this particular exception. Instead, we should improve replica recovery code to handle the case where ondisk size is less than acknowledged size, and update in-memory checksum accordingly. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org