Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 36809 invoked from network); 10 Jul 2007 22:29:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jul 2007 22:29:26 -0000 Received: (qmail 40019 invoked by uid 500); 10 Jul 2007 22:29:28 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 39842 invoked by uid 500); 10 Jul 2007 22:29:28 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 39833 invoked by uid 99); 10 Jul 2007 22:29:28 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2007 15:29:28 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2007 15:29:24 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id B736A7141FE for ; Tue, 10 Jul 2007 15:29:04 -0700 (PDT) Message-ID: <6380887.1184106544747.JavaMail.jira@brutus> Date: Tue, 10 Jul 2007 15:29:04 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1486) ReplicationMonitor thread goes away In-Reply-To: <721225.1181666665932.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1486: ------------------------------------- Status: Patch Available (was: Open) Exit the Namenode when the ReplicationMonitor thread encounters a RuntimeException. It would have been nice to be able to restart the namenode within the context of the same JVM, but a lot of work is needed to gracefully release all previously allocated resources. > ReplicationMonitor thread goes away > ------------------------------------ > > Key: HADOOP-1486 > URL: https://issues.apache.org/jira/browse/HADOOP-1486 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.12.3 > Reporter: Koji Noguchi > Assignee: dhruba borthakur > Priority: Blocker > Fix For: 0.14.0 > > Attachments: namenodeRestart2.patch > > > Saw many over/under replicated blocks in fsck output. > .out file showed > Exception in thread "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@2785982c" java.lang.IllegalArgumentException: Unexpected non-existing data node: /99.9.99.0/99.9.99.42:99999 > at org.apache.hadoop.net.NetworkTopology.checkArgument(NetworkTopology.java:379) > at org.apache.hadoop.net.NetworkTopology.isOnSameRack(NetworkTopology.java:424) > at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2853) > at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.chooseTarget(FSNamesystem.java:2816) > at org.apache.hadoop.dfs.FSNamesystem.pendingTransfers(FSNamesystem.java:2658) > at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1774) > at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1723) > at java.lang.Thread.run(Thread.java:619) > (same as HADOOP-1232) > And, jstack showed no ReplicationMonitor thread. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.