hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HMaster fails to start up, Failed construction of Master exception
Date Fri, 11 Mar 2011 19:41:23 GMT
I think the issue that your region server first suffered from may be
related, you might want to investigate that.

J-D

On Fri, Mar 11, 2011 at 11:38 AM, Nichole Treadway <kntreadway@gmail.com> wrote:
> Alright, I think I've got it working now. I increased the HBASE_HEAPSIZE
> value in hbase-env.sh and the HMaster finally started up and it looks like
> its working as normal now.
> I'm not really sure what caused this problem in the first place though since
> I've never encountered this problem before.
> My cells aren't fat but my table is very large, ~400 columns, two column
> families.
> Thank you for your help.
> On Fri, Mar 11, 2011 at 2:28 PM, Nichole Treadway <kntreadway@gmail.com>
> wrote:
>>
>> Sorry for not including that information in my original email.
>> Cluster Info:
>> I'm running the hadoop-0.20-append branch and HBase 0.90.1, and java 1.6.
>> All machines are 64-bit running Red Hat 5.5.
>>
>> I have a small cluster of 4 nodes all acting as datanodes and
>> regionservers. Replication in my cluster is set to 3.
>> As an update, I removed all regionservers except my master from the
>> regionservers list and from the zookeeper quorum list in hbase-site.xml. I
>> started up HBase again and was no longer seeing the "Failed Construction of
>> Master" errors I mentioned in my previous email. HMaster started up more
>> normally this time and began reading HLog files. It then printed a message
>> about not being able to contact some of my regionservers and quit again.
>> I added all the regionservers back again to the regionservers list and the
>> zookeeper qurom list. Now the master starts up, spends several minutes
>> printing messages about HLog files, and then fails again with the following
>> error:
>> 2011-03-11 14:17:58,197 FATAL org.apache.hadoop.hbase.master.HMaster:
>> Unhandled exception. Starting shutdown.
>> java.lang.OutOfMemoryError: Java heap space
>> at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:1970)
>> at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:1977)
>> at
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:118)
>> at
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1758)
>> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1886)
>> at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:198)
>> at
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172)
>> at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter.java:429)
>> at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:262)
>> at
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
>> at
>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>> at
>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:180)
>> at
>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379)
>> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
>> On Fri, Mar 11, 2011 at 1:20 PM, Jean-Daniel Cryans <jdcryans@apache.org>
>> wrote:
>>>
>>> Please include relevant basic information when asking that sort of
>>> question, such as hbase/hadoop version, hardware, OS, java version,
>>> cluster setup, etc.
>>>
>>> The exceptions seems to indicate that it's having a hard time getting
>>> data from zookeeper? Have you checked the zookeeper log(s)?
>>>
>>> Maybe that's a red herring tho, but without any context those lines of
>>> log could mean anything.
>>>
>>> J-D
>>>
>>> On Fri, Mar 11, 2011 at 8:49 AM, Nichole Treadway <kntreadway@gmail.com>
>>> wrote:
>>> > Last night I was putting pretty heavy load on my HBase cluster. One of
>>> > the
>>> > region servers shut down unexpectedly, and I restarted the
>>> > regionserver, but
>>> > HBase still wasn't assigning regions to it. I attempted to move regions
>>> > using the HBase shell but regions were still not being assigned to it.
>>> > In
>>> > the past when this has happened, I've just restarted HBase and it's
>>> > been
>>> > fine. I attempted to do this, but now HBase is failing to start up at
>>> > all.
>>> >
>>> > In my HMaster logs, here's the message I'm getting.
>>> >
>>> > 2011-03-11 11:30:51,014 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> > connection established to myip1/myip1:2181, initiating session
>>> >
>>> > 2011-03-11 11:31:04,004 INFO org.apache.zookeeper.ClientCnxn: Unable to
>>> > read
>>> > additional data from server sessionid 0x0, likely server has closed
>>> > socket,
>>> > closing socket connection and attempting reconnect
>>> >
>>> > 2011-03-11 11:31:04,107 ERROR
>>> > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start
>>> > master
>>> >
>>> > java.lang.RuntimeException: Failed construction of Master: class
>>> > org.apache.hadoop.hbase.master.HMaster
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1064)
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
>>> >
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
>>> >
>>> >
>>> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
>>> >
>>> >
>>> >        at
>>> > org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1078)
>>> >
>>> > Caused by:
>>> > org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> > KeeperErrorCode = ConnectionLoss for /hbase
>>> >
>>> >        at
>>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>> >
>>> >        at
>>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>> >
>>> >        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902)
>>> >
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
>>> >
>>> >
>>> >        at
>>> > org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:218)
>>> >
>>> >        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> > Method)
>>> >
>>> >        at
>>> >
>>> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>> >
>>> >
>>> >        at
>>> >
>>> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>> >
>>> >
>>> >        at
>>> > java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>> >
>>> >        at
>>> >
>>> > org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1059)
>>> >
>>> >        ... 5 more
>>> >
>>> >
>>> > -------------------
>>> >
>>> >
>>> > Errors I'm seeing in the Zookeeper logs:
>>> >
>>> >
>>> > 2011-03-11 11:30:47,479 WARN
>>> > org.apache.zookeeper.server.quorum.Learner:
>>> > Unexpected exception, tries=0, connecting to /myip:2888
>>> >
>>> > java.net.ConnectException: Connection refused
>>> >
>>> >        at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> >
>>> >        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>> >
>>> >        at
>>> > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>> >
>>> >        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>> >
>>> >        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>> >
>>> >        at java.net.Socket.connect(Socket.java:529)
>>> >
>>> >        at
>>> >
>>> > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:212)
>>> >
>>> >
>>> >        at
>>> >
>>> > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
>>> >
>>> >        at
>>> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)
>>> >
>>> >
>>> >
>>> > 2011-03-11 11:32:37,091 WARN
>>> > org.apache.zookeeper.server.quorum.QuorumCnxManager: Interrupted while
>>> > waiting for message on queue java.lang.InterruptedException
>>> >
>>> >        at
>>> >
>>> >  java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)
>>> >
>>> >        at
>>> >
>>> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2038)
>>> >
>>> >        at
>>> >
>>> > java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
>>> >
>>> >        at
>>> >
>>> > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:601)
>>> >
>>> >
>>> >
>>> > 2011-03-11 11:32:18,671 ERROR
>>> > org.apache.zookeeper.server.quorum.QuorumCnxManager: Failed to send
>>> > last
>>> > message. Shutting down
>>> > thread.java.nio.channels.AsynchronousCloseException
>>> >
>>> >        at
>>> >
>>> > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
>>> >
>>> >        at
>>> > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
>>> >
>>> >        at
>>> >
>>> > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:579)
>>> >
>>> >        at
>>> >
>>> > org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:588
>>> > )
>>> >
>>
>
>

Mime
View raw message