Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20CBD18D86 for ; Fri, 31 Jul 2015 22:57:14 +0000 (UTC) Received: (qmail 16791 invoked by uid 500); 31 Jul 2015 22:57:06 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 16570 invoked by uid 500); 31 Jul 2015 22:57:05 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 16405 invoked by uid 99); 31 Jul 2015 22:57:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2015 22:57:05 +0000 Date: Fri, 31 Jul 2015 22:57:05 +0000 (UTC) From: "Chris Trezzo (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-8245) Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-8245: ------------------------------- Labels: 2.6.1-candidate BB2015-05-TBR (was: BB2015-05-TBR) > Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log. > --------------------------------------------------------------------------------------- > > Key: HDFS-8245 > URL: https://issues.apache.org/jira/browse/HDFS-8245 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Rushabh S Shah > Assignee: Rushabh S Shah > Labels: 2.6.1-candidate, BB2015-05-TBR > Fix For: 2.7.1 > > Attachments: HDFS-8245-1.patch, HDFS-8245.patch > > > The following series of events happened on Standby namenode : > 2015-04-09 07:47:21,735 \[Edit log tailer] INFO ha.EditLogTailer: Triggering log roll on remote NameNode Active Namenode (ANN) > 2015-04-09 07:58:01,858 \[Edit log tailer] INFO ha.EditLogTailer: Triggering log roll on remote NameNode ANN > The following series of events happened on Active Namenode:, > 2015-04-09 07:47:21,747 \[IPC Server handler 99 on 8020] INFO namenode.FSNamesystem: Roll Edit Log from Standby NN (SNN) > 2015-04-09 07:58:01,868 \[IPC Server handler 18 on 8020] INFO namenode.FSNamesystem: Roll Edit Log from SNN > The following series of events happened on datanode ( {color:red} datanodeA {color}): > 2015-04-09 07:52:15,817 \[DataXceiver for client DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1 at /xxxx:51078 \[Receiving block BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867]] INFO datanode.DataNode: Receiving BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 src: /client:51078 dest: /{color:red}datanodeA:1004{color} > 2015-04-09 07:52:15,969 \[PacketResponder: BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace: src: /client:51078, dest: /{color:red}datanodeA:1004{color}, bytes: 20, op: HDFS_WRITE, cliID: DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1, offset: 0, srvID: 356a8a98-826f-446d-8f4c-ce288c1f0a75, blockid: BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, duration: 148948385 > 2015-04-09 07:52:15,969 \[PacketResponder: BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode: PacketResponder: BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, type=HAS_DOWNSTREAM_IN_PIPELINE terminating > 2015-04-09 07:52:25,970 \[DataXceiver for client /{color:red} {color}:52827 \[Copying block BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867]] INFO datanode.DataNode: Copied BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 to <{color:red}datanodeB{color}>:52827 > 2015-04-09 07:52:28,187 \[DataNode: heartbeating to ANN:8020] INFO impl.FsDatasetAsyncDiskService: Scheduling blk_1570321882_1102029183867 file /blk_1570321882 for deletion > 2015-04-09 07:52:28,188 \[Async disk worker #1482 for volume ] INFO impl.FsDatasetAsyncDiskService: Deleted BP-595383232-xxxx-1360869396230 blk_1570321882_1102029183867 file /blk_1570321882 > Then we failover for upgrade and then the standby became active. > When we did ls command on this file, we got the following exception: > 15/04/09 22:07:39 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. > java.io.IOException: Got error for OP_READ_BLOCK, self=/client:32947, remote={color:red}datanodeA:1004{color}, for file , for pool BP-595383232-xxxx-1360869396230 block 1570321882_1102029183867 > at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445) > at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410) > at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:815) > at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693) > at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:351) > at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576) > at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) > at org.apache.hadoop.fs.shell.CopyCommands$Merge.processArguments(CopyCommands.java:97) > at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) > at org.apache.hadoop.fs.shell.Command.run(Command.java:154) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) > Eventually this request was served by {color:red} datanodeB {color} on the second attempt, > 15/04/09 22:07:39 INFO hdfs.DFSClient: Successfully connected to {color:red}datanodeB:1004{color} for BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 > On further analyzing this problem, I found that when {color:red} datanodeA {color} notified the namenode after deleting that block, the then ANN didn't queue delete request to PendingDNMessages. > So when the then SNN read the edits, it didn't knew that the block from {color:red}datanodeA {color} was deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)