hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cuijianwei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13814) AssignmentManager does not write the correct server name into Zookeeper when unassign region
Date Wed, 24 Jun 2015 11:05:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599244#comment-14599244
] 

cuijianwei commented on HBASE-13814:
------------------------------------

Thanks for your concern [~lhofhansl] and sorry to reply late:). Yes, master.getServerName()
is not the correct server name serving the region and I think we need to save the right region
server name into znode in AssignmentManager#unassign so that AssignmentManager#isCarryingRegion
will get the right result. 

{quote}
I think we can simplify:
{code}
+      if (!regions.containsKey(region) || (serverName = regions.get(region)) == null) {
{code}
To
{code}
+      if ((serverName = regions.get(region)) == null) {
{code}
{quote}
Yes, it looks better, I will update the patch.

{quote}
Can we move:
{code}
+    ServerName serverName = null;
{code}
Inside the synchronized?
{quote}

Do you mean move it in the synchronized:
{code}
synchronized (this.regions) {
       // Check if this region is currently assigned
       if ((serverName = regions.get(region)) == null) {
           ...
{code}
However, the serverName will be used in another synchronized:
{code}
synchronized (regionsInTransition) {
      state = regionsInTransition.get(encodedName);
       synchronized (regionsInTransition) {
      state = regionsInTransition.get(encodedName);
      if (state == null) {
         // Create the znode in CLOSING state
        try {
          versionOfClosingNode = ZKAssign.createNodeClosing(
            master.getZooKeeper(), region, serverName); // ===> need to be used here
      
{code}
so that the serverName is not visible if it is defined in the first synchronized?

> AssignmentManager does not write the correct server name into Zookeeper when unassign
region
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-13814
>                 URL: https://issues.apache.org/jira/browse/HBASE-13814
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.94.27
>            Reporter: cuijianwei
>            Priority: Minor
>         Attachments: HBASE-13814-0.94-v1.patch
>
>
> When moving region, the region will firstly be unassigned from corresponding region server
by the method AssignmentManager#unassign(). AssignmentManager will write the region info and
the server name into Zookeeper by the following code:
> {code}
>           versionOfClosingNode = ZKAssign.createNodeClosing(
>             master.getZooKeeper(), region, master.getServerName());
> {code}
> It seems that the AssignmentManager misuses the master's name as the server name. If
the ROOT region is being moved and the region server holding the ROOT region is just crashed.
The Master will try to start a MetaServerShutdownHandler if the server is judged as holding
meta region. The judgment will be done by the method AssignmentManager#isCarryingRegion, and
the method will firstly check the server name in Zookeeper:
> {code}
>     ServerName addressFromZK = (data != null && data.getOrigin() != null) ?
>       data.getOrigin() : null;
>     if (addressFromZK != null) {
>       // if we get something from ZK, we will use the data
>       boolean matchZK = (addressFromZK != null &&
>         addressFromZK.equals(serverName));
> {code}
> The wrong server name from Zookeeper will make the server not be judged as holding the
ROOT region. Then, the master will start a ServerShutdownHandler. Unlike MetaServerShutdownHandler,
the ServerShutdownHandler won't assign ROOT region firstly, making the ROOT region won't be
assigned forever. In our test environment, we encounter this problem when moving ROOT region
and stopping the region server concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message