hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-3266) Master does not seem to properly scan ZK for running RS during startup
Date Tue, 26 Jul 2011 20:36:09 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu resolved HBASE-3266.
---------------------------

    Resolution: Not A Problem

>From Todd:
3266 is probably no longer valid given heartbeats don't exist in trunk.

> Master does not seem to properly scan ZK for running RS during startup
> ----------------------------------------------------------------------
>
>                 Key: HBASE-3266
>                 URL: https://issues.apache.org/jira/browse/HBASE-3266
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> I was in the situation described by HBASE-3265, where I had a number of RS waiting on
ROOT, but the master hadn't seen any RS checkins, so was waiting on checkins. To get past
this, I restarted one of the region servers. The restarted server checked in, and the master
began its startup.
> At this point the master started scanning /hbase/.logs for things to split. It correctly
identified that the RS on haus01 was running (this is the one I restarted):
> 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
hdfs://haus01.sf.cloudera.com:11020/hbase-normal/.logs/haus01.sf.cloudera.com,60020,1290500443143
belongs to an existing region server
> but then incorrectly decided that the RS on haus02 was down:
> 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
hdfs://haus01.sf.cloudera.com:11020/hbase-normal/.logs/haus02.sf.cloudera.com,60020,1290498411450
doesn't belong to a known region server, splitting
> However ZK shows that this RS is up:
> [zk: haus01.sf.cloudera.com:2222(CONNECTED) 3] ls /hbase/rs
> [haus04.sf.cloudera.com,60020,1290498411533, haus05.sf.cloudera.com,60020,1290498411520,
haus03.sf.cloudera.com,60020,1290498411518, haus01.sf.cloudera.com,60020,1290500443143, haus02.sf.cloudera.com,60020,1290498411450]
> splitLogsAfterStartup seems to check ServerManager.onlineServers, which best I can tell
is derived from heartbeats and not from ZK (sorry if I got some of this wrong, still new to
this new codebase)
> Of course, the master went into an infinite splitting loop at this point since haus02
is up and renewing its DFS lease on its logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message