hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Hmaster had crashed as disabling table
Date Tue, 29 Mar 2011 18:20:17 GMT
Oh yeah I see. So the issue is that if a region was closed and
disabled when the first master was running, it won't be assigned
anywhere and won't be in transition either (it's called being in RIT
in the code). When the new master comes around, and disable is called,
it does a check to see if the region is in RIT but not if it was
already disabled, and fails on NPE because it's not assigned to
anyone.

Calling enable before disable should get you out of the situation?

Would you mind opening a jira?

Thx!

J-D

2011/3/28 Gaojinchao <gaojinchao@huawei.com>:
> Hbase version is 0.90.1
>
>
> private Map<HServerInfo,List<Pair<HRegionInfo,Result>>> rebuildUserRegions()
>  throws IOException {
>    // Region assignment from META
>    List<Result> results = MetaReader.fullScanOfResults(catalogTracker);
>    // Map of offline servers and their regions to be returned
>    Map<HServerInfo,List<Pair<HRegionInfo,Result>>> offlineServers =
>      new TreeMap<HServerInfo,List<Pair<HRegionInfo,Result>>>();
>    // Iterate regions in META
>    for (Result result : results) {
>      Pair<HRegionInfo,HServerInfo> region =
>        MetaReader.metaRowToRegionPairWithInfo(result);
>      if (region == null) continue;
>      HServerInfo regionLocation = region.getSecond();
>      HRegionInfo regionInfo = region.getFirst();
>      if (regionLocation == null) {
>        // Region not being served, add to region map with no assignment
>        // If this needs to be assigned out, it will also be in ZK as RIT
>        this.regions.put(regionInfo, null);                                   ---- It
seems like some bug in special scenario when hamster restart or failover
>      } else if (!serverManager.isServerOnline(
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年3月29日 1:02
> 收件人: user@hbase.apache.org
> 主题: Re: Hmaster had crashed as disabling table
>
> Which HBase version is this?
>
> Thx,
>
> J-D
>
> 2011/3/28 Gaojinchao <gaojinchao@huawei.com>:
>> when master restart or Failover, it refresh user regions.
>> It seems having some bug.
>>
>>
>> if (regionCount == 0) {
>>      LOG.info("Master startup proceeding: cluster startup");
>>      this.assignmentManager.cleanoutUnassigned();
>>      this.assignmentManager.assignAllUserRegions();
>>    } else {
>>      LOG.info("Master startup proceeding: master failover");
>>      this.assignmentManager.processFailover();             -- when master restart
or Failover, it will refresh user regions.
>>    }
>>
>>
>> -----邮件原件-----
>> 发件人: Gaojinchao [mailto:gaojinchao@huawei.com]
>> 发送时间: 2011年3月28日 11:41
>> 收件人: user@hbase.apache.org
>> 主题: Hmaster had crashed as disabling table
>>
>> Operation step:
>> 1, startup cluster with HA master
>> 2, the active master crashed while it is creating table with region
>> 3, backup master become active.
>> 4, I want to drop the table
>> 5, the active master crashed
>>
>> I can't drop the table whatever I do ?
>>
>> The log as:
>>
>>
>> 2011-03-28 10:51:58,347 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler:
Attemping to disable table ufdr
>> 2011-03-28 10:51:58,374 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler:
Offlining 470 regions.
>> 2011-03-28 10:51:58,377 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,,1301128408707.a9d08c22b8a7b0f902ccffce424252fd. (offlining)
>> 2011-03-28 10:51:58,378 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613810384615,1301128408710.ba1a5fef02bd67b5630802fb2c5707a6.
(offlining)
>> 2011-03-28 10:51:58,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613810769230,1301128408710.12d027d3c1934f3fd76ef48915461569.
(offlining)
>> 2011-03-28 10:51:58,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613811153845,1301128408710.9700c58da2d0d1c9306b1d1ff832be1d.
(offlining)
>> 2011-03-28 10:51:58,384 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613811538460,1301128408710.862232de569b0c8efdac7ea350f30974.
(offlining)
>> 2011-03-28 10:51:58,385 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613811923075,1301128408711.54772ed69d8315e3a562d5e98bf61955.
(offlining)
>> 2011-03-28 10:51:58,385 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected
exception
>> java.lang.NullPointerException: Passed server is null
>>         at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:581)
>>         at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1093)
>>         at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1040)
>>         at org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler$1.run(DisableTableHandler.java:132)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2011-03-28 10:51:58,386 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>> 2011-03-28 10:51:58,386 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613814230765,1301128408711.2455e205497987dd83f40869c2bf0615.
(offlining)
>> 2011-03-28 10:51:58,386 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613813846150,1301128408711.5ea78e17593d2d0d8260fc1b2f58bf7c.
(offlining)
>> 2011-03-28 10:51:58,386 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613813461535,1301128408711.dae370c25610aca41ea060db7333f519.
(offlining)
>> 2011-03-28 10:51:58,386 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613813076920,1301128408711.a608be1914b40d8aa08d9ffb649826d3.
(offlining)
>> 2011-03-28 10:51:58,387 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613814999995,1301128408711.82f646d34013ea99b244e9e1837c4e04.
(offlining)
>> 2011-03-28 10:51:58,386 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613812307690,1301128408711.535d47842e7ad7b35be6e98e5f46b407.
(offlining)
>> 2011-03-28 10:51:58,385 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region ufdr,0008613812692305,1301128408711.4efc83d4a99eff9d4df7ce0154ef4c58.
(offlining)
>> 2011-03-28 10:51:58,385 FATAL org.apache.hadoop.hbase.master.HMaster: Remote unexpected
exception
>> java.lang.NullPointerException: Passed server is null
>>
>

Mime
View raw message