Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5506D76E for ; Fri, 9 Nov 2012 22:59:14 +0000 (UTC) Received: (qmail 51469 invoked by uid 500); 9 Nov 2012 22:59:14 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 51431 invoked by uid 500); 9 Nov 2012 22:59:14 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 51422 invoked by uid 99); 9 Nov 2012 22:59:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 22:59:14 +0000 Date: Fri, 9 Nov 2012 22:59:13 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <180675929.94308.1352501954047.JavaMail.jiratomcat@arcas> In-Reply-To: <1452209453.65973.1347400927886.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (HDFS-3921) NN will prematurely consider blocks missing when entering active state while still in safe mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494395#comment-13494395 ] Todd Lipcon commented on HDFS-3921: ----------------------------------- {code} - blockManager.clearQueues(); - blockManager.processAllPendingDNMessages(); - blockManager.processMisReplicatedBlocks(); + + if (!isInStartupSafeMode()) { + LOG.info("Reprocessing replication and invalidation queues"); + blockManager.clearQueues(); + blockManager.processAllPendingDNMessages(); + blockManager.processMisReplicatedBlocks(); + } {code} I'm not sure about this, but I think we may want to call {{processAllPendingDNMessages}} in both cases. Otherwise, we can end up with stuff in the pending/postponed queues even after we're in active mode, and they might never get processed, right? Also, should we use {{safeMode.isPopulatingReplQueues()}} instead of {{isInStartupSafeMode}}? At the end of safe mode, it will have already started processing queues, in which case we still need to re-process, no? > NN will prematurely consider blocks missing when entering active state while still in safe mode > ----------------------------------------------------------------------------------------------- > > Key: HDFS-3921 > URL: https://issues.apache.org/jira/browse/HDFS-3921 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.2-alpha > Reporter: Stephen Chu > Assignee: Aaron T. Myers > Attachments: HDFS-3921.patch > > > I shut down all the HDFS daemons in an Highly Available (automatic failover) cluster. > Then I started one NN and it transitioned it to active. No DNs were started, and I saw the red warning link on the NN web UI: > WARNING : There are 36 missing blocks. Please check the logs or run fsck in order to identify the missing blocks. > I clicked this to go to the corrupt_files.jsp page, which ran into the following error: > {noformat} > HTTP ERROR 500 > Problem accessing /corrupt_files.jsp. Reason: > Cannot run listCorruptFileBlocks because replication queues have not been initialized. > Caused by: > java.io.IOException: Cannot run listCorruptFileBlocks because replication queues have not been initialized. > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.listCorruptFileBlocks(FSNamesystem.java:5035) > at org.apache.hadoop.hdfs.server.namenode.corrupt_005ffiles_jsp._jspService(corrupt_005ffiles_jsp.java:78) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1039) > at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira