hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2065) Cannot disable a table if any of its region is opening at the same time
Date Wed, 23 Dec 2009 06:37:29 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans updated HBASE-2065:
--------------------------------------

    Attachment: HBASE-2065-2.patch

This new patch attempts to fix the issue seen in Hudson (a region being opened while table
is disabled was falling in a hole) as well as a new issue found with more testing: a region
closed and set unassigned by the master is seen by TableOperation as an assigned region since
the server info is still there so it's then switched from "unassigned" to "closing" by ChangeTableState.
The rever is never reassigned nor seen as closed.

More comments:

TestAdmin:
- I added a more difficult test where we enable and disable so root out more issues.

HRegion:
- Refactored offlineRegionInMETA to add a new method called removeServerInfoInMETA which does
pretty much that.

ProcessRegionClose:
- Calls the new HRegion method in order to clean the .META. entry.

ChangeTableState:
- Added that if we are disabling a table and we see a pending open region, that we do not
attempt to mark it as "closing" since the master will be confused when the region server reports
opening the region. We will rely now on the next modif in HBA...

HBaseAdmin:
- Changed that when enabling/disabling, instead of calling the Master's method and wait, we
call it on every iteration to take into account regions that are moving like a pending open.


> Cannot disable a table if any of its region is opening at the same time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-2065
>                 URL: https://issues.apache.org/jira/browse/HBASE-2065
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2065-2.patch, HBASE-2065-branch.patch, HBASE-2065.patch
>
>
> Also found with the test in the parent jira:
> {code}
> 2009-12-21 18:31:44,411 INFO  [IPC Server handler 0 on 60000] master.RegionManager(331):
Assigning region table113,,1261449026166 to 10.10.1.54,60853,1261448823301
> 2009-12-21 18:31:44,411 INFO  [IPC Server handler 0 on 60000] master.RegionManager(331):
Assigning region table121,,1261449041385 to 10.10.1.54,60853,1261448823301
> 2009-12-21 18:31:44,411 INFO  [RegionServer:1] regionserver.HRegionServer(475): MSG_REGION_OPEN:
table113,,1261449026166
> 2009-12-21 18:31:44,411 INFO  [RegionServer:1] regionserver.HRegionServer(475): MSG_REGION_OPEN:
table121,,1261449041385
> ...
> 2009-12-21 18:31:44,418 INFO  [RegionServer:1.worker] regionserver.HRegion(343): region
table113,,1261449026166/21044806 available; sequence id is 0
> ...
> 2009-12-21 18:31:44,445 DEBUG [IPC Server handler 4 on 60000] master.ChangeTableState(121):
Adding region table113,,1261449026166 to setClosing list
> 2009-12-21 18:31:44,446 DEBUG [main] zookeeper.ZooKeeperWrapper(392): Read ZNode /hbase/root-region-server
got 10.10.1.54:608532009-12-21 18:31:44,447 DEBUG [main] client.HConnectionManager$TableServers(990):
Found ROOT at 10.10.1.54:60853
> 2009-12-21 18:31:44,447 DEBUG [main] client.HConnectionManager$TableServers(899): Cached
location for .META.,,1 is 10.10.1.54:608552009-12-21 18:31:44,453 DEBUG [main] client.HConnectionManager$TableServers(554):
Rowscanned=1, rowsOffline=0
> 2009-12-21 18:31:44,454 DEBUG [main] client.HBaseAdmin(397): Sleep. Waiting for all regions
to be disabled from table1132009-12-21 18:31:44,554 DEBUG [main] client.HBaseAdmin(406): Wake.
Waiting for all regions to be disabled from table113
> ...
> 2009-12-21 18:31:44,642 INFO  [RegionServer:0] regionserver.HRegionServer(475): MSG_REGION_CLOSE:
table113,,1261449026166
> ...
> 2009-12-21 18:31:44,642 INFO  [RegionServer:0.worker] regionserver.HRegionServer$Worker(1332):
Worker: MSG_REGION_CLOSE: table113,,1261449026166
> ...
> 2009-12-21 18:31:44,664 INFO  [IPC Server handler 0 on 60000] master.ServerManager(421):
Processing MSG_REPORT_PROCESS_OPEN: table113,,1261449026166 from 10.10.1.54,60853,1261448823301;
1 of 4
> ...
> 2009-12-21 18:31:44,664 INFO  [IPC Server handler 0 on 60000] master.ServerManager(421):
Processing MSG_REPORT_OPEN: table113,,1261449026166 from 10.10.1.54,60853,1261448823301; 3
of 4
> 2009-12-21 18:31:44,664 DEBUG [IPC Server handler 0 on 60000] master.ServerManager(562):
region server 10.10.1.54:60853 should not have opened region table113,,1261449026166
> 2009-12-21 18:31:44,666 INFO  [RegionServer:1] regionserver.HRegionServer(475): MSG_REGION_CLOSE_WITHOUT_REPORT:
table113,,1261449026166: Duplicate assignment
> 2009-12-21 18:31:44,666 INFO  [RegionServer:1.worker] regionserver.HRegionServer$Worker(1332):
Worker: MSG_REGION_CLOSE_WITHOUT_REPORT: table113,,1261449026166: Duplicate assignment
> {code}
> Here the master reassigned table13 and told the old region server to close the region
before the new one was able to report that it opened it. At the end the new region server
(good one) is also told to close it  After that my test times out, table13 is not disabled
neither it is deployed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message