Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D596F19DF5 for ; Thu, 28 Apr 2016 17:55:14 +0000 (UTC) Received: (qmail 62586 invoked by uid 500); 28 Apr 2016 17:55:14 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 62384 invoked by uid 500); 28 Apr 2016 17:55:14 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 62121 invoked by uid 99); 28 Apr 2016 17:55:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2016 17:55:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 2CBB72C1F64 for ; Thu, 28 Apr 2016 17:55:13 +0000 (UTC) Date: Thu, 28 Apr 2016 17:55:13 +0000 (UTC) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262619#comment-15262619 ] Hudson commented on HDFS-9958: ------------------------------ FAILURE: Integrated in Hadoop-trunk-Commit #9688 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9688/]) HDFS-9958. BlockManager#createLocatedBlocks can throw NPE for (kihwal: rev 6243eabb48390fffada2418ade5adf9e0766afbe) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCorruption.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages. > ------------------------------------------------------------------------------------ > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.2 > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is taken out of blocksMap, there is a race which causes the creation of LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the case where corrupt replica is on failed storage will amount to numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of the storage. Therefore numMachines will include such (failed) nodes. The assert would fail only if the system is enabled to catch Assertion errors, otherwise it goes ahead and tries to create LocatedBlock object for that is not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)