hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Pratt <prat...@adobe.com>
Subject RE: Catching ZK ConnectionLoss with HTable
Date Mon, 11 Apr 2011 21:13:49 GMT
Thanks J-D.  I'll keep an eye on the Jira.

> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-
> Daniel Cryans
> Sent: Monday, April 11, 2011 11:52
> To: user@hbase.apache.org
> Subject: Re: Catching ZK ConnectionLoss with HTable
> 
> I'm cleaning this up in this jira
> https://issues.apache.org/jira/browse/HBASE-3755
> 
> But it's a failure case I haven't seen before, really interesting.
> There's a HTable that's created in the guts if HCM that will throw a
> ZookeeperConnectionException but it will bubble up as an IOE. I'll try to
> address this too in 3755.
> 
> J-D
> 
> On Mon, Apr 11, 2011 at 11:03 AM, Sandy Pratt <prattrs@adobe.com> wrote:
> > Hi all,
> >
> > I had an issue recently where a scan job I frequently run caught
> ConnectionLoss and subsequently failed to recover.
> >
> > The stack trace looks like this:
> >
> > 11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session:
> 0x12f2497b00d03d8
> > closed
> > 11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No
> > longer connected to ZooKeeper, current state: Disconnected
> > 11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session:
> 0x12f2497b00d03d9
> > closed
> > 11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to
> > zookeeper
> > 11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client
> > connection, connectString=localhost:21811 sessionTimeout=60000
> > watcher=org.apache.hadoop.hbase.z
> ookeeper.ZooKeeperWrapper@51127a
> > 11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server
> > null, unexpected error, closing socket connection and attempting
> > reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> > 11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting
> > stats for /hbase/rs
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/rs
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
> >        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
> >        at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCo
> unt
> > (ZooKeeperWrapper.java:754)
> >        at
> > org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
> >        at
> > org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
> >        at
> >
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:
> 1
> > 02)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetc
> > hRegionCache(HConnectionManager.java:732)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateR
> > egionInMeta(HConnectionManager.java:783)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateR
> > egion(HConnectionManager.java:677)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocat
> > eRegion(HConnectionManager.java:650)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getReg
> i
> > onLocation(HConnectionManager.java:470)
> >        at
> > org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(Server
> > Callable.java:57)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getReg
> i
> > onServerWithRetries(HConnectionManager.java:1145)
> >        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> >        at
> > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuild
> > er.java:215)
> >        at
> > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:3
> > 10)
> >        at
> > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130)
> >        at BuildAfs.main(BuildAfs.java:43)
> > 11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server
> > null, unexpected error, closing socket connection and attempting
> > reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> > 11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server
> > null, unexpected error, closing socket connection and attempting
> > reconnect
> >
> > It then goes on to retry endlessly.  Killing the spinning job and running it
> again worked fine, so crashing would be preferable to me over retrying
> endlessly.
> >
> > I'm not especially concerned about what went wrong to cause
> ConnectionLoss in the first place, but I am interested in being able to set
> some behavior for handling the ZK exceptions elegantly.  For example, the
> call site in my code leading to the exception is this:
> >
> > Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts)));
> > Result result = timestampsTable.get(get);
> >
> > I suppose this means that if I want to catch ConnectionLoss in my code, I
> have to wrap all my gets and puts with that catch block.  Or maybe just the
> first one?  It seems like HTable and friends might be able to catch this
> exception in a more central location, maybe somewhere in here:
> >
> > at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCo
> unt
> > (ZooKeeperWrapper.java:754)
> >
> > I'm running HBase 0.89.20100924+28.  Will this issue go away if I upgrade to
> a newer version?
> >
> > Thanks,
> > Sandy
> >

Mime
View raw message