hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7390) Add extra test cases for assignement on the region server and fix the related issues
Date Fri, 21 Dec 2012 13:55:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537874#comment-13537874

nkeywal commented on HBASE-7390:

bq. I'm not sure I follow.
      Boolean openAction = regionsInTransitionInRS.get(encodedName);
      if (openAction != null) {         // If this region is already in transition
        if (openAction.booleanValue()) { // and is opening         
          regionsInTransitionInRS.replace(encodedName, openAction, Boolean.FALSE);
          // say it's now closing
        checkIfRegionInTransition(encodedName, CLOSE); // throw an exception
        // we will never go there, we always throw an exception at the previous line
        // So if we were opening, we now have a OpenRegionHandler running but the RIT state
says closing.
        // What's going to happen to the znode now?
      // If we're there, it means there is nothing in RIT    
      LOG.info("Received close region: " + region.getRegionNameAsString() +
        ". Version of ZK closing node:" + versionOfClosingNode +
        ". Destination server:" + sn);
      HRegionInfo regionInfo = region.getRegionInfo();         
      checkIfRegionInTransition(encodedName, CLOSE); // We've just checked 0.0001 ms ago!

I looked at it more, it's actually by design: the OpenRegionHandler checks for the content
of this variable. So the behavior is: 'if we received the request to close a region we're
currently opening, stop the opening but raise an exception saying we can't close again because
we were already closing".
Issues are:
- What's going to happen with the znode? It seems it won't be changed in the openHandler for
example. So it will remain forever.
- It's a little bit racy: if we were at the end of the opening, we can get the exception but
the opening will be done.
- worse, the closeRegion call will be rejected at its very beginning, because when we try
to do the "getRegionByEncodedName", it throws a "RegionIsNotOnline"
  => In many cases, the opening will continue, as if the region is opening it should not
be in the onlineRegions list.
  => It could be an impact of the coprocessor implementation.
  => As a side node, as the closeRegion closes are nested, the coprocessors "preClose"
will be called multiple times.
  => As another side node, the way it's coded in the other closeRegion is that we don't
call the coprocessors when the region is not online
  => So it means that we don"t check the access rigts if the region is not online, it allows
to cancel an opening without the AR.
- As well, we had no test case (that's why my patch, while broken, worked).
I'm trying to solve this in the last version.

bq. + zkw.sync(encoded);
It's not absolutely necessary, but it maximizes corectness probability. It's done in all other
cases as well, so not doing it will would be bad imho, as it would break consistency.

bq. Review comments. Some error. e.g. + * @param expectedVersion expecpted version og the
znode It looks like klingon.

bq. This below continue is right?
I've changed the implementation to remove the continue. I found another issue while doing
so. Previously, when the region was already in RIT and closing, we were saying "OPENED" on
a bulkAssign (that is: the query was open, and we were saying 'opened' while closing the region).
This seems wrong. We now say "FAILED_OPENING" in this case.

bq. On the patch, what Sergey said about different closeRegion returns. Could the one that
throws an exception be private?
I looked at it in more details. Actually, what returns this function is never checked. So
I've renamed it and clean stuff around it.

bq. Would putIfAbsent be easier to read if it took an enum rather than a boolean to say whether
open or close
It's was to be compatible with RIT design. I removed the method. I tried to change RIT, but
its type is public through the getter, so I prefer not doing it in a patch.

I will publish an updated version soon.
> Add extra test cases for assignement on the region server and fix the related issues
> ------------------------------------------------------------------------------------
>                 Key: HBASE-7390
>                 URL: https://issues.apache.org/jira/browse/HBASE-7390
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>             Fix For: 0.96.0
>         Attachments: 7390.v1.patch, 7390.v2.patch, 7390.v3.patch, 7390.v4.patch, assignment_zk_states.jpg
> We don't have a lot of tests on the region server itself.
> Here are some.
> Some of them are failing, feedback welcome.
> See as well the attached state diagram for the ZK nodes on assignment.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message