hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: A question about Hmaster startup.
Date Tue, 19 Apr 2011 15:20:03 GMT
Mind making an issue and a patch?  We can apply it for 0.90.3 which
should be out soon.  Thank you Gaojinchao.
St.Ack

2011/4/19 Gaojinchao <gaojinchao@huawei.com>:
> I think it need fix. Because Hmaster can't start up when DN is up.
>
> Can It recover the code ?
>
> Hmaster logs.
>
> 2011-04-19 16:49:09,208 DEBUG org.apache.hadoop.hbase.master.ActiveMasterManager: A master
is now available
> 2011-04-19 16:49:09,400 WARN org.apache.hadoop.hbase.util.FSUtils: Version file was empty,
odd, will try to set it.
> 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/hbase.version could
only be replicated to 0 nodes, instead of 1
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1310)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:817)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>        at $Proxy5.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.addBlock(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2881)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
> 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block
null bad datanode[0] nodes == null
> 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations.
Source file "/hbase/hbase.version" - Aborting...
> 2011-04-19 16:51:09,674 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version
file at hdfs://C4C1:9000/hbase, retrying: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
File /hbase/hbase.version could only be replicated to 0 nodes, instead of 1
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1310)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:817)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>        at $Proxy5.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.addBlock(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3000)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2881)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
> 2011-04-19 16:56:19,695 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version
file at hdfs://C4C1:9000/hbase, retrying: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
failed to create file /hbase/hbase.version for DFSClient_hb_m_C4C1.site:60000_1303202948768
on client 157.5.100.1 because current leaseholder is trying to recreate file.
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1068)
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:407)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:817)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>        at $Proxy5.create(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.create(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2759)
>        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:496)
>        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:526)
>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:507)
>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:414)
>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:406)
>        at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:255)
>        at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:239)
>        at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:199)
>        at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:246)
>        at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:106)
>        at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:91)
>        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:347)
>        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
>
> -----邮件原件-----
> 发件人: Gaojinchao [mailto:gaojinchao@huawei.com]
> 发送时间: 2011年4月19日 15:16
> 收件人: user@hbase.apache.org
> 主题: re: A question about Hmaster startup.
>
> It reproduces when HMaster is started for the first time and NN is started without starting
DN.
> So, It may be nothing.
>
> Hbase version 0.90.1 :
>  public static void waitOnSafeMode(final Configuration conf,
>    final long wait)
>  throws IOException {
>    FileSystem fs = FileSystem.get(conf);
>    if (!(fs instanceof DistributedFileSystem)) return;
>    DistributedFileSystem dfs = (DistributedFileSystem)fs;
>    // Are there any data nodes up yet?
>    // Currently the safe mode check falls through if the namenode is up but no
>    // datanodes have reported in yet.
>    try {                                                                   // This code
is deleted
>      while (dfs.getDataNodeStats().length == 0) {
>        LOG.info("Waiting for dfs to come up...");
>        try {
>          Thread.sleep(wait);
>        } catch (InterruptedException e) {
>          //continue
>        }
>      }
>    } catch (IOException e) {
>      // getDataNodeStats can fail if superuser privilege is required to run
>      // the datanode report, just ignore it
>    }
>    // Make sure dfs is not in safe mode
>    while (dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_GET)) {
>      LOG.info("Waiting for dfs to exit safe mode...");
>      try {
>        Thread.sleep(wait);
>      } catch (InterruptedException e) {
>        //continue
>      }
>    }
>  }
>
> Hbase version 0.90.2:
>
>  public static void waitOnSafeMode(final Configuration conf,
>    final long wait)
>  throws IOException {
>    FileSystem fs = FileSystem.get(conf);
>    if (!(fs instanceof DistributedFileSystem)) return;
>    DistributedFileSystem dfs = (DistributedFileSystem)fs;
>    // Make sure dfs is not in safe mode
>    while (dfs.setSafeMode(FSConstants.SafeModeAction.SAFEMODE_GET)) {
>      LOG.info("Waiting for dfs to exit safe mode...");
>      try {
>        Thread.sleep(wait);
>      } catch (InterruptedException e) {
>        //continue
>      }
>    }
>  }
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年4月19日 13:15
> 收件人: user@hbase.apache.org
> 主题: Re: A question about Hmaster startup.
>
> On Mon, Apr 18, 2011 at 9:26 PM, Gaojinchao <gaojinchao@huawei.com> wrote:
>> Sorry.
>> My queston is:
>> If HMaster is started after NN without starting DN in Hbase 090.2 then HMaster is
not able to start due to AlreadyCreatedException for /hbase/hbase.version.
>> In hbase version 0.90.1, It will wait for data node start up.
>>
>> I try to dig the code and find the code changes in hbase version 0.90.2 and can't
find issue for this.
>>
>
> Thanks for digging in.
>
> I don't see the code block you are referring to in HMaster in 0.90.1.
> As per J-D, its out in FSUtils.java when we get to 0.90 (I checked
> 0.90.0 and its not there either).
>
> What you are seeing seems similar to:
>
>   HBASE-3502  Can't open region because can't open .regioninfo because
>               AlreadyBeingCreatedException
>
> .... except in your case its hbase.version.  Is there another master
> running by chance that still has the lease on this file?
>
> Looking at code, it should be doing as it used to.  We go into
> checkRootDir and first thing we call is  FSUtils.waitOnSafeMode and
> then we just hang there till dfs says its left safe mode.
>
> Maybe add some logging in there?
>
> St.Ack
>

Mime
View raw message