hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-3266) Master does not seem to properly scan ZK for running RS during startup
Date Tue, 23 Nov 2010 20:12:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-3266:

    Fix Version/s: 0.90.0

Bringing into 0.90.0 while we triage.

> Master does not seem to properly scan ZK for running RS during startup
> ----------------------------------------------------------------------
>                 Key: HBASE-3266
>                 URL: https://issues.apache.org/jira/browse/HBASE-3266
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.0
> I was in the situation described by HBASE-3265, where I had a number of RS waiting on
ROOT, but the master hadn't seen any RS checkins, so was waiting on checkins. To get past
this, I restarted one of the region servers. The restarted server checked in, and the master
began its startup.
> At this point the master started scanning /hbase/.logs for things to split. It correctly
identified that the RS on haus01 was running (this is the one I restarted):
> 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
belongs to an existing region server
> but then incorrectly decided that the RS on haus02 was down:
> 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
doesn't belong to a known region server, splitting
> However ZK shows that this RS is up:
> [zk: haus01.sf.cloudera.com:2222(CONNECTED) 3] ls /hbase/rs
> [haus04.sf.cloudera.com,60020,1290498411533, haus05.sf.cloudera.com,60020,1290498411520,
haus03.sf.cloudera.com,60020,1290498411518, haus01.sf.cloudera.com,60020,1290500443143, haus02.sf.cloudera.com,60020,1290498411450]
> splitLogsAfterStartup seems to check ServerManager.onlineServers, which best I can tell
is derived from heartbeats and not from ZK (sorry if I got some of this wrong, still new to
this new codebase)
> Of course, the master went into an infinite splitting loop at this point since haus02
is up and renewing its DFS lease on its logs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message