hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-1921) When the Master's session times out and there's only one, cluster is wedged
Date Tue, 20 Oct 2009 18:07:59 GMT

     [ https://issues.apache.org/jira/browse/HBASE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans updated HBASE-1921:
--------------------------------------

    Attachment: HBASE-1921.patch

Patch that does what I described and here's what you will see when it happens:

{code}2009-10-20 10:53:38,708 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event None
with path null
2009-10-20 10:53:39,997 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
/10.10.1.58:2181
2009-10-20 10:53:39,998 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/10.10.1.58:56099 remote=/10.10.1.58:2181]
2009-10-20 10:53:39,998 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-10-20 10:53:40,000 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12472fd41f10004
to sun.nio.ch.SelectionKeyImpl@2afb6c5f
java.io.IOException: Session Expired
	at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
	at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2009-10-20 10:53:40,000 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event None with
path null
2009-10-20 10:53:40,000 INFO org.apache.hadoop.hbase.master.HMaster: Master lost its znode,
trying to get a new one
2009-10-20 10:53:40,000 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x12472fd41f10004
2009-10-20 10:53:40,000 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session:
0x12472fd41f10004
2009-10-20 10:53:40,001 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for
session: 0x12472fd41f10004
2009-10-20 10:53:40,001 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12472fd41f10004 closed
2009-10-20 10:53:40,001 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection
with ZooKeeper
2009-10-20 10:53:40,003 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=10.10.1.58:2181 sessionTimeout=60000 watcher=Thread[HMaster,5,main]
2009-10-20 10:53:40,003 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server
/10.10.1.58:2181
2009-10-20 10:53:40,005 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected
local=/10.10.1.58:56100 remote=/10.10.1.58:2181]
2009-10-20 10:53:40,006 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-10-20 10:53:40,009 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event None with
path null
2009-10-20 10:53:40,012 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Wrote master
address 10.10.1.58:60000 to ZooKeeper
2009-10-20 10:53:40,016 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/master got 10.10.1.58:60000
2009-10-20 10:53:40,017 DEBUG org.apache.hadoop.hbase.master.HMaster: Checking cluster state...
2009-10-20 10:53:40,017 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/root-region-server got 10.10.1.58:60020
2009-10-20 10:53:40,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/rs/1256061062528 got 10.10.1.58:60020
2009-10-20 10:53:40,019 INFO org.apache.hadoop.hbase.master.HMaster: This is a failover, ZK
inspection begins...
2009-10-20 10:53:40,020 DEBUG org.apache.hadoop.hbase.master.HMaster: Inspection found server
10.10.1.58
2009-10-20 10:53:40,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated
ZNode /hbase/rs/1256061062528 with data 10.10.1.58:60020
2009-10-20 10:53:40,028 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: SetData
of ZNode /hbase/root-region-server with 10.10.1.58:60020
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Inspection found 3 regions,
with -ROOT-
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Found log folder : 10.10.1.58,60020,1256061062528
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Log folder belongs to
an existing region server
2009-10-20 10:53:40,029 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2009-10-20 10:54:38,601 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers,
0 dead, average load 3.0
2009-10-20 10:54:38,602 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
scanning meta region {server: 10.10.1.58:60020, regionname: -ROOT-,,0, startKey: <>}
2009-10-20 10:54:38,607 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scanning meta region {server: 10.10.1.58:60020, regionname: .META.,,1, startKey: <>}
2009-10-20 10:54:38,611 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
scan of 1 row(s) of meta region {server: 10.10.1.58:60020, regionname: -ROOT-,,0, startKey:
<>} complete
2009-10-20 10:54:38,615 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner
scan of 1 row(s) of meta region {server: 10.10.1.58:60020, regionname: .META.,,1, startKey:
<>} complete
2009-10-20 10:54:38,615 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
scanned
{code}

> When the Master's session times out and there's only one, cluster is wedged
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1921
>                 URL: https://issues.apache.org/jira/browse/HBASE-1921
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: HBASE-1921.patch
>
>
> On IRC, some fella had a session expiration on his Master and had only one. Maybe in
this case the Master should first try to re-get the znode?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message