zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: HBase dies after some time
Date Sat, 26 May 2012 02:02:58 GMT
These are the exceptions I see in ZooKeeper log:


2012-05-25 13:56:55,523 - ERROR [CommitProcessor:2:NIOServerCnxn@445] -
Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
        at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
2012-05-25 13:56:55,523 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to
read additional data from client sessionid 0x237858fc7a00003, likely client
has closed socket
2012-05-25 13:56:55,523 - ERROR [CommitProcessor:2:NIOServerCnxn@445] -
Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
        at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
2012-05-25 13:56:55,524 - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for
client /10.58.102.92:44170 which had sessionid 0x237858fc7a00003
2012-05-25 13:56:55,524 - ERROR [CommitProcessor:2:NIOServerCnxn@445] -
Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
        at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
2012-05-25 13:56:55,524 - ERROR [CommitProcessor:2:NIOServerCnxn@445] -
Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
        at
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
        at
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
        at
org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)


On Fri, May 25, 2012 at 6:46 PM, N Keywal <nkeywal@gmail.com> wrote:

> Hi,
>
> The master have lost its connection to ZooKeeper. In this case it
> stops as the consistency of the cluster cannot be ensured. There are
> both a retry number and a timeout setting to control this, but it's
> not the root cause. The default is 10 tries and 3 minutes, so when it
> happens it means you have a serious issue.
> Note that the region servers will continue to work without the master.
> But may be they have lost their connection to ZK as well (in this case
> they will stop, for the same reason). You should have a look at how is
> the network between ZK and the master, or look after ZK logs to check
> that nobody killed it / killed them.
>
> N.
>
> On Sat, May 26, 2012 at 3:22 AM, Something Something
> <mailinglists19@gmail.com> wrote:
> > Hello,
> >
> > I recently installed ZooKeeper & HBase on our dedicated Hadoop cluster on
> > EC2.  The HBase stays active for some time, but after a while it dies
> with
> > error messages similar to these:
> >
> > 2012-05-25 12:09:27,514 ERROR
> > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
> > master:60000-0x5378489312c0004-0x5378489312c0004 Received unexpected
> > KeeperException, re-throwing exception
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/master
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> >        at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
> >        at
> >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620)
> >        at
> >
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197)
> >        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310)
> > 2012-05-25 12:09:27,514 ERROR
> > org.apache.hadoop.hbase.master.ActiveMasterManager:
> > master:60000-0x5378489312c0004-0x5378489312c0004 Error deleting our own
> > master address node
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/master
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> >        at
> > org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549)
> >        at
> >
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620)
> >        at
> >
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197)
> >        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310)
> >
> >
> > This kills the HMaster as well as all HRegionServers.  Could it be that
> my
> > ZooKeeper setup is incorrect?  Please help.  Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message