Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 32DD4DCC1 for ; Mon, 13 Aug 2012 21:00:41 +0000 (UTC) Received: (qmail 75315 invoked by uid 500); 13 Aug 2012 21:00:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 75266 invoked by uid 500); 13 Aug 2012 21:00:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 75258 invoked by uid 99); 13 Aug 2012 21:00:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 21:00:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7B3F92C5ACC for ; Mon, 13 Aug 2012 21:00:40 +0000 (UTC) Date: Tue, 14 Aug 2012 08:00:40 +1100 (NCT) From: "Eli Collins (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <504973585.3888.1344891640505.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3787) BlockManager#close races with ReplicationMonitor#run MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433530#comment-13433530 ] Eli Collins commented on HDFS-3787: ----------------------------------- I kicked the pre-commit build manually. > BlockManager#close races with ReplicationMonitor#run > ---------------------------------------------------- > > Key: HDFS-3787 > URL: https://issues.apache.org/jira/browse/HDFS-3787 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 2.0.0-alpha > Reporter: Andy Isaacson > Assignee: Andy Isaacson > Priority: Minor > Attachments: hdfs-3787-2.txt, hdfs-3787-2.txt, hdfs-3787.txt > > > We saw {{TestDirectoryScanner}} fail during shutdown: > {code} > 2012-08-09 12:17:19,844 WARN datanode.DataNode (BPServiceActor.java:run(683)) - Ending block pool service for: Block pool BP-610123021-172.29.121.238-1344539835759 (storage id DS-1581877160-172.29.121.238-43609-1344539837880) service to localhost/127.0.0.1:40012 > ... > 2012-08-09 12:17:19,876 FATAL blockmanagement.BlockManager (BlockManager.java:run(3039)) - ReplicationMonitor thread received Runtime exception. > java.lang.NullPointerException > at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1141) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1116) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3070) > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3032) > at java.lang.Thread.run(Thread.java:662) > {code} > Inspecting the code, it appears that {{BlockManager#close -> BlocksMap#close}} can set {{blocks}} to {{null}} while {{computeDatanodeWork}} is running. > The fix seems simple -- have {{close}} just set an exit flag, and have {{ReplicationMonitor#run}} call {{BlocksMap#close}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira