hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuy...@snowcocoa.info>
Subject Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start
Date Thu, 29 Oct 2009 12:25:02 GMT
Hi Artyom,

I should have made it clear that I was giving you advice only one of
those problem you have had. It seems you have at least three different
problems:

In your first email:
1. HBase master went down after a few days of testing.
2. HBase didn't start again; because of an HDFS error.

In your last email:
3. HBase region server was not responding after trying to scan table
with 6 million rows several times.

And the possible cause and solution I have been telling you was only
for the problem #2, not for others.


So, for the problem #3, your master and client logs tell that the
HBase region server is not responding on port 53169. However it
doesn't tell why it's not responding. You should have region server
log in the logs directory as well, so can you check it if there is any
error message?


Also, in your first mail, you said your server has only 2GB or RAM.

On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <shoolc@gmail.com> wrote:
> We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
> Hadoop 0.20.1 on such environment:
> *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 Rpm
> *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
> 0.20.1, HBase 0.20.1


2GB of RAM is definitely too small to fit entire Hadoop and HBase
clusters. You should be aware that you are trying to run the following
Java processes on your server, and 2GB RAM is too small for them.

1. Hadoop DFS Name Node
2. Hadoop DFS Secondary Name Node
3. Hadoop DFS Data Node
4. ZooKeeper
5. HBase Master
6. HBase Region Server
7. Hadoop Job Tracker
8. Hadoop Task Tracker
9….  Hadoop Map/Reduce processes


I would suggest you to have at least 8GB of RAM for just light-load
testing, and add more servers for heavy-load testing. HBase is memory
intensive and it becomes unstable when it don't have enough memory. I
bet your problem #1 and #3 will magically disappear if you add more
RAM to your server.


So try to check the region server log, and try to consider to add more
RAM to your server.

Thanks,


-- 
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Thu, Oct 29, 2009 at 7:06 PM, Artyom Shvedchikov <shoolc@gmail.com> wrote:
> Hello, Tatsuya.
>
> Yesterday we trapped in the same problem - master was down.
>
> Here is a part of hbase master log after hbase became unavailable through
> hbase shell and Java hbase client.
>
>
>> 2009-10-29 00:00:36,920 INFO org.apache.hadoop.hbase.master.ServerManager:
>> 1 region servers, 0 dead, average load 5.0
>> 2009-10-29 00:00:37,150 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> RegionManager.rootScanner scanning meta region {server: 127.0.0.1:53169,
>> regionname: -ROOT-,,0, startKe$
>> 2009-10-29 00:00:37,151 WARN org.apache.hadoop.hbase.master.BaseScanner:
>> Scan ROOT region
>> java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
>>         at $Proxy2.openScanner(Unknown Source)
>>         at
>> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
>>         at
>> org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
>>         at
>> org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
>>         at
>> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
>>         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>
> and this repeats to the end of log.
>
> Zookeeper log part:
>
> 2009-10-29 02:45:14,138 INFO
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 17b
>> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Connected to /127.0.0.1:56897 lastZxid 0
>> 2009-10-29 08:34:04,751 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Creating new session 0x124962e1e530013
>> 2009-10-29 08:34:04,776 INFO org.apache.zookeeper.server.NIOServerCnxn:
>> Finished init of 0x124962e1e530013 valid:true
>> 2009-10-29 08:34:09,689 WARN
>> org.apache.zookeeper.server.PrepRequestProcessor: Got exception when
>> processing sessionid:0x124962e1e530013 type:create cxid:0x2
>> zxid:0xfffffffffff$
>> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
>> NodeExists
>>         at
>> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:245)
>>         at
>> org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:114)
>>
> and this repeats to the end of log.
>
> HBase became unavailable after we try to scan table with 6 000 000 rows
> several times.
>
> Hbase Java client log:
>
> Error during table scan: java.lang.RuntimeException:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
>> region server 127.0.0.1:53169 for region channel_products,,1256660737751,
>> row '', but failed after 10 attempts.
>> Exceptions:
>> java.lang.NoClassDefFoundError: org/mortbay/log/Log
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>> java.net.ConnectException: Connection refused
>>
>
> HBase shell log:
>
> hbase@localhost:/hadoop$ ./hbase/bin/hbase shell
>> HBase Shell; enter 'help<RETURN>' for list of supported commands.
>> Version: 0.20.1, r822817, Wed Oct  7 11:55:42 PDT 2009
>> hbase(main):001:0> status
>> 1 servers, 0 dead, 5.0000 average load
>> hbase(main):002:0> list
>> 09/10/29 08:34:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:11 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:23 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:33 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:37 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:47 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:57 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:34:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:03 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:05 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:07 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:09 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:13 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:15 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:17 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:19 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:21 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:25 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:27 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:29 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:31 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:35 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:39 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:41 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:43 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:45 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:49 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:51 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:53 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:55 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> 09/10/29 08:35:59 INFO ipc.HbaseRPC: Server at /127.0.0.1:53169 could not
>> be reached after 1 tries, giving up.
>> NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Trying to contact region server null for region , row '', but failed after 5
>> attempts.
>> Exceptions:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up
>> proxy to /127.0.0.1:53169 after attempts=1
>>
>>     from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>> `getRegionServerWithRetries'
>>     from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>>     from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>>     from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>> `listTables'
>>     from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>>     from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>>     from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>>     from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>>     from java/lang/reflect/Method.java:597:in `invoke'
>>     from org/jruby/javasupport/JavaMethod.java:298:in
>> `invokeWithExceptionHandling'
>>     from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>>     from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>>     from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>>     from org/jruby/ast/ForNode.java:104:in `interpret'
>> ... 112 levels...
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb#start:-1:in `call'
>>     from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>> `call'
>>     from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
>> `call'
>>     from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>> `call'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>>     from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:497:in `__file__'
>>     from hadoop/hbase/bin/$_dot_dot_/bin/hirb.rb:-1:in `load'
>>     from org/jruby/Ruby.java:577:in `runScript'
>>     from org/jruby/Ruby.java:480:in `runNormally'
>>     from org/jruby/Ruby.java:354:in `runFromMain'
>>     from org/jruby/Main.java:229:in `run'
>>     from org/jruby/Main.java:110:in `run'
>>     from org/jruby/Main.java:94:in `main'
>>     from /hadoop/hbase/bin/../bin/hirb.rb:338:in `list'
>>     from (hbase):3hbase(main):003:0>
>>
>
> HDFS name node still available through web interface.
> NameNode 'localhost:9000'
>  Started: Tue Oct 27 03:12:08 EET 2009  Version: 0.20.1, r810220  Compiled:
> Tue Sep 1 20:55:56 UTC 2009 by oom  Upgrades: There are no upgrades in
> progress.
>
> *Browse the filesystem <http://77.122.169.205:50070/nn_browsedfscontent.jsp>
> *
> *Namenode Logs <http://77.122.169.205:50070/logs/>*
> ------------------------------
> Cluster Summary * * * 116 files and directories, 98 blocks = 214 total. Heap
> Size is 10.94 MB / 963 MB (1%)
> *
>  Configured Capacity : 229.36 GB DFS Used : 3.04 GB  Non DFS Used : 14.46
> GB DFS Remaining : 211.87 GB  DFS Used% : 1.32 % DFS Remaining% : 92.37 % Live
> Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=LIVE> : 1 Dead
> Nodes <http://77.122.169.205:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0
>
> ------------------------------
>  NameNode Storage:
>  *Storage Directory**Type**State*/hadoop/hdfs/nameIMAGE_AND_EDITSActive
>
> ------------------------------
> Hadoop <http://hadoop.apache.org/core>, 2009.
>
>
> Could you check this, maybe some other thoughts will appear.
>
> Thanks a lot for your time.
> -------------------------------------------------
> Best wishes, Artyom Shvedchikov

Mime
View raw message