Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 869A315F3 for ; Tue, 19 Apr 2011 15:20:32 +0000 (UTC) Received: (qmail 96162 invoked by uid 500); 19 Apr 2011 15:20:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 96133 invoked by uid 500); 19 Apr 2011 15:20:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 96125 invoked by uid 99); 19 Apr 2011 15:20:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Apr 2011 15:20:31 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Apr 2011 15:20:24 +0000 Received: by fxm18 with SMTP id 18so5173356fxm.14 for ; Tue, 19 Apr 2011 08:20:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=wP0NaLTMgLsHlEokfJMS2Mk+i8/9tkk6Crh4Nv/N4GI=; b=QgSwOHXKkVwls0c/Sa2S/c0Y/bZHqB/VhsBR7sMeHsxfnB/vZXKqTHhVEqyqDae5QW TDbS94lphMSaZRATkTMcqxEQBKdFfozlTVBLGjWxfmUhVmbUlTGjd13FFncqWZQNjyHT D9HbttxbvFfZfHiy8lbuYVWjJpH58ookIrdVs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=Bfr4AtX9HmtKByEgNeyYqt+MEx+d+L7ehZy2yuOlj+ZK/jrLGSZMdmOkrnY6E91MMn ps6/08YvGESqVH6vzbI/vnVsJunZBhEHEg7p4joq2kmSATKBG3bWGyj+aoay3Lom9TxH mS2ni1YZRiEBpTdx9ap0zcNr0XNyzc+CSlLdA= MIME-Version: 1.0 Received: by 10.223.75.133 with SMTP id y5mr974624faj.136.1303226403707; Tue, 19 Apr 2011 08:20:03 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.223.97.7 with HTTP; Tue, 19 Apr 2011 08:20:03 -0700 (PDT) In-Reply-To: References: Date: Tue, 19 Apr 2011 08:20:03 -0700 X-Google-Sender-Auth: BcRTEBZTeaZAGmO6uX4zzx5Q9Mg Message-ID: Subject: Re: A question about Hmaster startup. From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Mind making an issue and a patch? We can apply it for 0.90.3 which should be out soon. Thank you Gaojinchao. St.Ack 2011/4/19 Gaojinchao : > I think it need fix. Because Hmaster can't start up when DN is up. > > Can It recover the code ? > > Hmaster logs. > > 2011-04-19 16:49:09,208 DEBUG org.apache.hadoop.hbase.master.ActiveMaster= Manager: A master is now available > 2011-04-19 16:49:09,400 WARN org.apache.hadoop.hbase.util.FSUtils: Versio= n file was empty, odd, will try to set it. > 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: DataStream= er Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: F= ile /hbase/hbase.version could only be replicated to 0 nodes, instead of 1 > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAddition= alBlock(FSNamesystem.java:1310) > at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNo= de.java:469) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) > > at org.apache.hadoop.ipc.Client.call(Client.java:817) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) > at $Proxy5.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(= RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryI= nvocationHandler.java:59) > at $Proxy5.addBlock(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowin= gBlock(DFSClient.java:3000) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutpu= tStream(DFSClient.java:2881) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DF= SClient.java:2139) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.r= un(DFSClient.java:2329) > > 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Error Reco= very for block null bad datanode[0] nodes =3D=3D null > 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Could not = get block locations. Source file "/hbase/hbase.version" - Aborting... > 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hbase.util.FSUtils: Unable= to create version file at hdfs://C4C1:9000/hbase, retrying: org.apache.had= oop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version cou= ld only be replicated to 0 nodes, instead of 1 > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAddition= alBlock(FSNamesystem.java:1310) > at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNo= de.java:469) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) > > at org.apache.hadoop.ipc.Client.call(Client.java:817) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) > at $Proxy5.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(= RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryI= nvocationHandler.java:59) > at $Proxy5.addBlock(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowin= gBlock(DFSClient.java:3000) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutpu= tStream(DFSClient.java:2881) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DF= SClient.java:2139) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.r= un(DFSClient.java:2329) > > 2011-04-19 16:56:19,695 WARN org.apache.hadoop.hbase.util.FSUtils: Unable= to create version file at hdfs://C4C1:9000/hbase, retrying: org.apache.had= oop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreate= dException: failed to create file /hbase/hbase.version for DFSClient_hb_m_C= 4C1.site:60000_1303202948768 on client 157.5.100.1 because current leasehol= der is trying to recreate file. > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileIn= ternal(FSNamesystem.java:1068) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(F= SNamesystem.java:1002) > at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode= .java:407) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962) > > at org.apache.hadoop.ipc.Client.call(Client.java:817) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) > at $Proxy5.create(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso= rImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(= RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryI= nvocationHandler.java:59) > at $Proxy5.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClie= nt.java:2759) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:496) > at org.apache.hadoop.hdfs.DistributedFileSystem.create(Distributed= FileSystem.java:195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:526) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:507) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:414) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:406) > at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:25= 5) > at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:23= 9) > at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:= 199) > at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(Ma= sterFileSystem.java:246) > at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFi= leSystemLayout(MasterFileSystem.java:106) > at org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFi= leSystem.java:91) > at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMa= ster.java:347) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > > -----=D3=CA=BC=FE=D4=AD=BC=FE----- > =B7=A2=BC=FE=C8=CB: Gaojinchao [mailto:gaojinchao@huawei.com] > =B7=A2=CB=CD=CA=B1=BC=E4: 2011=C4=EA4=D4=C219=C8=D5 15:16 > =CA=D5=BC=FE=C8=CB: user@hbase.apache.org > =D6=F7=CC=E2: re: A question about Hmaster startup. > > It reproduces when HMaster is started for the first time and NN is starte= d without starting DN. > So, It may be nothing. > > Hbase version 0.90.1 : > public static void waitOnSafeMode(final Configuration conf, > final long wait) > throws IOException { > FileSystem fs =3D FileSystem.get(conf); > if (!(fs instanceof DistributedFileSystem)) return; > DistributedFileSystem dfs =3D (DistributedFileSystem)fs; > // Are there any data nodes up yet? > // Currently the safe mode check falls through if the namenode is up b= ut no > // datanodes have reported in yet. > try { = // This code is deleted > while (dfs.getDataNodeStats().length =3D=3D 0) { > LOG.info("Waiting for dfs to come up..."); > try { > Thread.sleep(wait); > } catch (InterruptedException e) { > //continue > } > } > } catch (IOException e) { > // getDataNodeStats can fail if superuser privilege is required to r= un > // the datanode report, just ignore it > } > // Make sure dfs is not in safe mode > while (dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_GET)) { > LOG.info("Waiting for dfs to exit safe mode..."); > try { > Thread.sleep(wait); > } catch (InterruptedException e) { > //continue > } > } > } > > Hbase version 0.90.2=A3=BA > > public static void waitOnSafeMode(final Configuration conf, > final long wait) > throws IOException { > FileSystem fs =3D FileSystem.get(conf); > if (!(fs instanceof DistributedFileSystem)) return; > DistributedFileSystem dfs =3D (DistributedFileSystem)fs; > // Make sure dfs is not in safe mode > while (dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_GET)) { > LOG.info("Waiting for dfs to exit safe mode..."); > try { > Thread.sleep(wait); > } catch (InterruptedException e) { > //continue > } > } > } > > -----=D3=CA=BC=FE=D4=AD=BC=FE----- > =B7=A2=BC=FE=C8=CB: saint.ack@gmail.com [mailto:saint.ack@gmail.com] =B4= =FA=B1=ED Stack > =B7=A2=CB=CD=CA=B1=BC=E4: 2011=C4=EA4=D4=C219=C8=D5 13:15 > =CA=D5=BC=FE=C8=CB: user@hbase.apache.org > =D6=F7=CC=E2: Re: A question about Hmaster startup. > > On Mon, Apr 18, 2011 at 9:26 PM, Gaojinchao wrote= : >> Sorry. >> My queston is: >> If HMaster is started after NN without starting DN in Hbase 090.2 then H= Master is not able to start due to AlreadyCreatedException for /hbase/hbase= .version. >> In hbase version 0.90.1, It will wait for data node start up. >> >> I try to dig the code and find the code changes in hbase version 0.90.2 = and can't find issue for this. >> > > Thanks for digging in. > > I don't see the code block you are referring to in HMaster in 0.90.1. > As per J-D, its out in FSUtils.java when we get to 0.90 (I checked > 0.90.0 and its not there either). > > What you are seeing seems similar to: > > HBASE-3502 Can't open region because can't open .regioninfo because > AlreadyBeingCreatedException > > .... except in your case its hbase.version. Is there another master > running by chance that still has the lease on this file? > > Looking at code, it should be doing as it used to. We go into > checkRootDir and first thing we call is FSUtils.waitOnSafeMode and > then we just hang there till dfs says its left safe mode. > > Maybe add some logging in there? > > St.Ack >