Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97B22C3F2 for ; Mon, 2 Jul 2012 10:22:24 +0000 (UTC) Received: (qmail 80517 invoked by uid 500); 2 Jul 2012 10:22:23 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 80425 invoked by uid 500); 2 Jul 2012 10:22:22 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 80391 invoked by uid 99); 2 Jul 2012 10:22:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 10:22:21 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.210.48 as permitted sender) Received: from [209.85.210.48] (HELO mail-pz0-f48.google.com) (209.85.210.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 10:22:17 +0000 Received: by dadz8 with SMTP id z8so8613722dad.35 for ; Mon, 02 Jul 2012 03:21:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=gyXPVVg9+LqxkyiSi0L4fdGkPxYLoX+Vumge9OetlWA=; b=Rvhy8BepTC0PJuWjsaww3z0ikn0C9FQ6l5Dhvbc03rAaJXimzMYyEREd3qgg15614N KswpgbGy+Fo45mQYYebgyrntqii1vcq8/EphtEsqBb9qe05PG4zPbQogbC8qfFMeHbRp CqZyx/l63BOs6REteecKy7zRuVOSglTld6FHCt9ONTHYese3WJF0wLnWzzcsB4YQ4vEM UnNFGw/mPBGyTRVT9LM4c+A9d8YBAw0pi/m9khmX2PJUdF5RP4BxbKzFaG0mnpKmK2ZE nvNn1wGUoNuDcZbbkwa6J2JzuBM0SBWy8kiTlGaCx0osDVd0pCKdRvFml9Sd6Nv2Zvv8 DNKg== Received: by 10.66.85.135 with SMTP id h7mr19988565paz.75.1341224517048; Mon, 02 Jul 2012 03:21:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.134.6 with HTTP; Mon, 2 Jul 2012 03:21:36 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Mon, 2 Jul 2012 15:51:36 +0530 Message-ID: Subject: Re: Namenode hangs on startup To: hdfs-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkNZStxwjQq0vvrhe8jZMxoPGTe0FAPN8xH73n9bspiyZ9HBNdL8cMZrvQMg+Yi4xVZSkZm X-Virus-Checked: Checked by ClamAV on apache.org Jianhui, Can you pastebin.com the output of your "jstack " command after its hung, and pass us the paste link please? It looks to me like it may have just been merging/saving the image, and that may be slow but it depends on how long did you have to wait around to see NN resume and begin properly. On Mon, Jul 2, 2012 at 2:34 PM, Jianhui Zhang wrote: > Hi, > > Apache Hadoop 0.20.205. > > I'm trying to restart NN and it always hangs at the very beginning. > The only logs I've got are: > > /************************************************************ > STARTUP_MSG: Starting NameNode > STARTUP_MSG: host = host/ip > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.205.0 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-205 > -r 1179940; compiled by 'hortonfo' on Fri Oct 7 06:20:32 UTC 2011 > ************************************************************/ > 2012-07-02 01:33:01,281 INFO > org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from > hadoop-metrics2.properties > 2012-07-02 01:33:01,290 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > MetricsSystem,sub=Stats registered. > 2012-07-02 01:33:01,292 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot > period at 10 second(s). > 2012-07-02 01:33:01,292 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics > system started > 2012-07-02 01:33:01,434 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > ugi registered. > 2012-07-02 01:33:01,436 WARN > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi > already exists! > 2012-07-02 01:33:01,441 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > jvm registered. > 2012-07-02 01:33:01,441 INFO > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source > NameNode registered. > 2012-07-02 01:33:01,463 INFO org.apache.hadoop.hdfs.util.GSet: VM type > = 64-bit > 2012-07-02 01:33:01,463 INFO org.apache.hadoop.hdfs.util.GSet: 2% max > memory = 314.0275 MB > 2012-07-02 01:33:01,463 INFO org.apache.hadoop.hdfs.util.GSet: > capacity = 2^25 = 33554432 entries > 2012-07-02 01:33:01,463 INFO org.apache.hadoop.hdfs.util.GSet: > recommended=33554432, actual=33554432 > 2012-07-02 01:33:01,546 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=owner > 2012-07-02 01:33:01,546 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > supergroup=supergroup > 2012-07-02 01:33:01,546 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isPermissionEnabled=true > 2012-07-02 01:33:01,550 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > dfs.block.invalidate.limit=100 > 2012-07-02 01:33:01,550 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), > accessTokenLifetime=0 min(s) > 2012-07-02 01:33:01,787 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered > FSNamesystemStateMBean and NameNodeMXBean > 2012-07-02 01:33:01,802 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names > occuring more than 10 times > 2012-07-02 01:33:01,811 INFO > org.apache.hadoop.hdfs.server.common.Storage: Number of files = 17032 > 2012-07-02 01:33:02,406 INFO > org.apache.hadoop.hdfs.server.common.Storage: Number of files under > construction = 0 > 2012-07-02 01:33:02,406 INFO > org.apache.hadoop.hdfs.server.common.Storage: Image file of size > 2553316 loaded in 0 seconds. > 2012-07-02 01:33:02,410 INFO > org.apache.hadoop.hdfs.server.common.Storage: Edits file > /apr/hdfs/name/current/edits of size 498 edits # 7 loaded in 0 > seconds. > > ==================================== > > It hangs thereafter.... I wonder if anybody has seen this before? > > Some background: I shut down DFS and MR while there were still jobs > running. Some MR jobs were hanging, so I manually killed the children > JVMs after the shutdown. Not sure how such actions would affect NN > startup. > > Any help would be appreciated. > > Thanks, > James -- Harsh J