hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: how to recover hbase
Date Thu, 24 Jun 2010 16:35:05 GMT
Trying to decode what the exceptions means without any context is
extremely hard. Your configurations looks good except for:

 <property>
   <name>hbase.regionserver.dns.interface</name>
   <value>192.168.1.122</value>
   <description></description>
 </property>

It expects an interface (like eth0) not an IP. And setting this alone;

 <property>
   <name>hbase.zookeeper.dns.nameserver</name>
   <value>192.168.1.122</value>
   <description></description>
 </property>

Won't work either, you need to set also the interface.

Let's try something, try to stop all the processes (kill -9 if needed)
then wipe out the logs. Start anew, zip all the logs, and send them to
me directly.

J-D

On Thu, Jun 24, 2010 at 1:42 AM, 梁景明 <futureha@gmail.com> wrote:
> i dont know how to describe my situation more. i just want to restart
> successful again and get my data back.
> 1、bin/start-hbase.sh show  all running.
> 2、bin/stop-hbase.sh can't stop normally.
> 3、regionserver cant see sometimes. after kill master process ,and restart
> bin/start-hbase.sh ,it shows ok. but master can't work.
> 4、hadoop hdfs runs ok.and on port 50070 i can read /hbase folders.
> 5、here is my hbase-site.xml,and test1 and s1.idfs.cn is the same ip
> 192.168.1.122 ,first i set s1.idfs.cn on hbase.zookeeper.quorum but it only
> know the hostname test1. s1.idfs.cn is based onmy dns.
> <configuration>
>  <property>
>    <name>hbase.rootdir</name>
>    <value>hdfs://s1.idfs.cn:9000/hbase</value>
>    <description>The directory shared by region servers.
>    </description>
>  </property>
>  <property>
>    <name>hbase.cluster.distributed</name>
>    <value>true</value>
>    <description>
>    </description>
>  </property>
>  <property>
>    <name>fs.default.name</name>
>    <value>hdfs://s1.idfs.cn:9000</value>
>    <description></description>
>  </property>
>  <property>
>    <name>hbase.zookeeper.dns.nameserver</name>
>    <value>192.168.1.122</value>
>    <description></description>
>  </property>
>  <property>
>    <name>hbase.regionserver.dns.interface</name>
>    <value>192.168.1.122</value>
>    <description></description>
>  </property>
> <property>
>    <name>hbase.zookeeper.property.clientPort</name>
>    <value>2222</value>
>    <description>Property from ZooKeeper's config zoo.cfg.
>    The port at which the clients will connect.
>    </description>
>  </property>
>  <property>
>    <name>hbase.zookeeper.quorum</name>
>    <value>test1</value>
>  </property>
> </configuration>
>
> regionserver file is
> s1.idfs.cn
> s2.idfs.cn
>
> hbase runs ok first time ,and i create tables and insert data.
>
> 6、i try to  use bin/zkCli.sh -server 192.168.1.122:2222 to look at /hbase in
> zookeeper ,maybe some useful info to you.thanks.
>
> [zk: 192.168.1.122:2222(CONNECTED) 0] ls /
> [hbase, zookeeper]
> [zk: 192.168.1.122:2222(CONNECTED) 16] ls /hbase
> [safe-mode, root-region-server, rs, master, shutdown]
>
> see hbase in /
>
> [zk: 192.168.1.122:2222(CONNECTED) 10] get /hbase/master
> 192.168.1.122:60000
> cZxid = 0x1c
> ctime = Thu Jun 24 14:39:21 CST 2010
> mZxid = 0x1c
> mtime = Thu Jun 24 14:39:21 CST 2010
> pZxid = 0x1c
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x12968ae99ca0000
> dataLength = 19
> numChildren = 0
>
> that 's my master 192.168.1.122
>
> [zk: 192.168.1.122:2222(CONNECTED) 14] get /hbase/root-region-server
> 192.168.1.123:60020
> cZxid = 0xa
> ctime = Thu Jun 24 10:38:00 CST 2010
> mZxid = 0x25
> mtime = Thu Jun 24 14:39:31 CST 2010
> pZxid = 0xa
> cversion = 0
> dataVersion = 1
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 19
> numChildren = 0
>
> i set two region servers but here just one.
>
> [zk: 192.168.1.122:2222(CONNECTED) 11] get /hbase/shutdown
> up
> cZxid = 0x1d
> ctime = Thu Jun 24 14:39:21 CST 2010
> mZxid = 0x1d
> mtime = Thu Jun 24 14:39:21 CST 2010
> pZxid = 0x1d
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 2
> numChildren = 0
>
> [zk: 192.168.1.122:2222(CONNECTED) 12] get /hbase/rs
>
> cZxid = 0x6
> ctime = Thu Jun 24 10:37:28 CST 2010
> mZxid = 0x6
> mtime = Thu Jun 24 10:37:28 CST 2010
> pZxid = 0x21
> cversion = 6
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2
>
> [zk: 192.168.1.122:2222(CONNECTED) 19] ls /hbase/safe-mode
> []
>
>
>
>
> 2010/6/24 梁景明 <futureha@gmail.com>
>
>> and more details, when i kill the  process of hbase. restart it again
>> ,regionserver on 60030 can see,it started ok.
>> ,but master on 60010 show this . and the data /hbase still in hadoop hdfs.
>> that 's what i want to say.
>> the data /hbase stays ,but i can't find any way to start hbase again.
>>
>>
>> HTTP ERROR: 500
>>
>> Trying to contact region server null for region , row '', but failed after 3 attempts.
>> Exceptions:
>>
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate
root region because: Failed setting up proxy to /192.168.1.123:60020 after attempts=1
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate
root region because: Failed setting up proxy to /192.168.1.123:60020 after attempts=1
>>
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate
root region because: Failed setting up proxy to /192.168.1.123:60020 after attempts=1
>>
>> RequestURI=/master.jsp
>> Caused by:
>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region
server null for region , row '', but failed after 3 attempts.
>> Exceptions:
>>
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate
root region because: Failed setting up proxy to /192.168.1.123:60020 after attempts=1
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate
root region because: Failed setting up proxy to /192.168.1.123:60020 after attempts=1
>>
>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate
root region because: Failed setting up proxy to /192.168.1.123:60020 after attempts=1
>>
>>       at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1055)
>>
>>       at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:75)
>>       at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:48)
>>       at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:454)
>>
>>       at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
>>       at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:132)
>>       at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
>>
>>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>       at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
>>       at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
>>
>>       at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>       at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>       at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>
>>       at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>>       at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>       at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>
>>       at org.mortbay.jetty.Server.handle(Server.java:324)
>>       at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>>       at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>>
>>       at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
>>       at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
>>       at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
>>       at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>>
>>       at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>>
>> *Powered by Jetty:// <http://jetty.mortbay.org/>*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 2010/6/24 梁景明 <futureha@gmail.com>
>>
>> exactly like this . it 's some problem with zookeeper, i am not sure what
>>> happen to zookeeper,
>>> it  is all started .but port 60030 and 60010 not ok.
>>>
>>> ---------------------------------------------------------------------------
>>> futureha@test1:~/hbase$ bin/start-hbase.sh
>>> test1: zookeeper running as process 18596. Stop it first.
>>> master running as process 20047. Stop it first.
>>> s1.idfs.cn: regionserver running as process 18829. Stop it first.
>>> s2.idfs.cn: regionserver running as process 18763. Stop it first.
>>>
>>> ------------------------------------------------------------------------------------------
>>>
>>> and logs in hbase give me the following, and i dont know how to deal with
>>> it.if zookeeper is dead or goes with some problems,
>>> how do i do> stop-hbase.sh & start-hbase.sh don't work at all
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------
>>> 2010-06-24 11:33:29,713 WARN
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase
>>> -- check quorum servers, currently=test1:2222
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /hbase
>>>     at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>>     at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780)
>>>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:808)
>>>     at
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:405)
>>>     at
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:432)
>>>     at
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:520)
>>>     at
>>> org.apache.hadoop.hbase.master.HMaster.writeAddressToZooKeeper(HMaster.java:260)
>>>     at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:242)
>>>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>     at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>     at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>>     at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1230)
>>>     at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1271)
>>> 2010-06-24 11:33:31,202 INFO org.apache.zookeeper.ClientCnxn: Attempting
>>> connection to server test1/192.168.1.122:2222
>>> 2010-06-24 11:33:31,203 INFO org.apache.zookeeper.ClientCnxn: Priming
>>> connection to java.nio.channels.SocketChannel[connected local=/
>>> 192.168.1.122:52706 remote=test1/192.168.1.122:2222]
>>> 2010-06-24 11:33:31,203 INFO org.apache.zookeeper.ClientCnxn: Server
>>> connection successful
>>> 2010-06-24 11:33:31,204 WARN org.apache.zookeeper.ClientCnxn: Exception
>>> closing session 0x0 to sun.nio.ch.SelectionKeyImpl@163f7a1
>>> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
>>> lim=4 cap=4]
>>>     at
>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
>>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>>> 2010-06-24 11:33:31,204 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>> exception during shutdown input
>>> java.net.SocketException: Transport endpoint is not connected
>>>     at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>     at
>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>>     at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>> 2010-06-24 11:33:31,204 WARN org.apache.zookeeper.ClientCnxn: Ignoring
>>> exception during shutdown output
>>> java.net.SocketException: Transport endpoint is not connected
>>>     at sun.nio.ch.SocketChannelImpl.shutdown(Native Method)
>>>     at
>>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651)
>>>     at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004)
>>>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> 2010/6/22 Jean-Daniel Cryans <jdcryans@apache.org>
>>>
>>> I'm not sure I understand what you describe, and since you didn't post
>>>> any output from your logs then it's really hard to help you debug.
>>>>
>>>> What's the problem exactly and do you see any exception in the logs?
>>>>
>>>> J-D
>>>>
>>>> On Mon, Jun 21, 2010 at 2:48 AM, 梁景明 <futureha@gmail.com> wrote:
>>>> > after reading "Description of how HBase uses ZooKeeper"i see my problem
>>>> > maybe that the regionserver session in zk is lost!
>>>> >
>>>> > and i use bin/start-hbase.sh cant start hbase successfully .
>>>> >
>>>> > because they connect to zookeeper something lost?
>>>> >
>>>> > to start it.one way i think zookeeper start alone ,and i delete
>>>> "/hbase" in
>>>> > it , and run the start-hbase.sh shell again?
>>>> >
>>>> > will it be ok?
>>>> >
>>>> > 2010/6/19 Jean-Daniel Cryans <jdcryans@apache.org>
>>>> >
>>>> >> > do u mean if ZooKeeper is dead,the data will lose?
>>>> >>
>>>> >> If your Zookeeper ensemble is dead, then HBase will be unavailable
but
>>>> >> you won't lose any data. And even if your zookeeper data is wiped
out,
>>>> >> like I said it's only runtime data so it doesn't matter.
>>>> >>
>>>> >> >
>>>> >> > in that case,ZooKeeper lost .META or .ROOT ,the data in hadoop
will
>>>> never
>>>> >> be
>>>> >> > recover , thought there were some table folders in hadoop.
>>>> >>
>>>> >> HBase stores the location of -ROOT- in Zookeeper, and that's changed
>>>> >> everytime the region moves. Losing that won't make -ROOT- disappear
>>>> >> forever, it's still in HDFS.
>>>> >>
>>>> >> Does it answer the question? (I'm not sure I fully understand you)
>>>> >>
>>>> >> J-D
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Mime
View raw message