hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy
Date Wed, 24 Aug 2011 06:20:29 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090021#comment-13090021
] 

stack commented on HBASE-4015:
------------------------------

@Ted In most other state transitions, we have the znode version to hand (we previously read
the znode state).  Its this one operation where we are transferring region ownership from
master to regionserver that has the hole.  A new state "seems" to be a patch on the actual
problem but I could be convinced otherwise (Looking at Ram's diagram the new state seemed
cleaner but I had sneaking suspicion that the new state introduces a bunch of new conditions
not dealt with in the diagram).

On the patch (Thanks for digging in here Ram):

Why the special handling of meta regions here:

{code}
-        regionsInTransition.put(encodedRegionName, new RegionState(
-            regionInfo, RegionState.State.OPENING,
-            data.getStamp(), data.getOrigin()));
+        if (regionInfo.isMetaRegion() || regionInfo.isRootRegion()) {
+          regionsInTransition.put(encodedRegionName, new RegionState(regionInfo,
+              RegionState.State.OPENING, data.getStamp(), data.getOrigin()));
+          processOpeningState(regionInfo);
+          break;
+        }
+        regionsInTransition.put(encodedRegionName, new RegionState(regionInfo,
+            RegionState.State.OPENING, data.getStamp(), data.getOrigin()));
{code}

FYI, IIRC, regionInfo.isMetaRegion() is true if .META. or -ROOT-

This is ok, not setting a watcher here?

{code}
+      RegionTransitionData dataInZNode = ZKAssign.getDataNoWatch(watcher, node,
+          stat);
{code}

Here you might want to add logging whether or not its a reallocate:

{code}
-      LOG.info("Table " + tableName + (disabled? " disabled;": " disabling;") +
-        " skipping assign of " + region.getRegionNameAsString());
+      LOG.info("Table " + tableName + (disabled ? " disabled;" : " disabling;")
+          + " skipping assign of " + region.getRegionNameAsString());
{code}

FYI, I prefer the previous style where the '+' is at the end of the line; it flags reader
that there is more to come.... (IMO).

What is happening here?  Who will have set realloc true?  The timeout monitor?  So who then
will set it to OFFLINE?

{code}
-      LOG.debug("Forcing OFFLINE; was=" + state);
-      state.update(RegionState.State.OFFLINE);
+      // If invoked from timeout monitor donot force it to OFFLINE. Based on the
+      // state
+      // we will decide if to change in-memory state to OFFLINE or not.
+      if (!isReAllocate) {
+        LOG.debug("Forcing OFFLINE; was=" + state);
+        state.update(RegionState.State.OFFLINE);
+      }
{code}

When would this happen:

{code}
+      if (setOfflineInZK && versionOfOfflineNode == -1)
+        return;
{code}

i.e... it would come back w/ -1?

Style: If multiple lines, it needs curly brackets (Would suggest leaving stuff like this alone)

{code}
-      if (plan == null) return; // Should get reassigned later when RIT times out.
+      if (plan == null)
+        return; // Should get reassigned later when RIT times out.
{code}

At the other extreme, its fine to fix stuff like this where its w/o spaces around the '+'
operator:

{code}
-            if(LOG.isDebugEnabled()){
-              LOG.debug("The unassigned node "+encodedRegionName+" doesnot exist.");
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("The unassigned node " + encodedRegionName
+                  + " doesnot exist.");
{code}

(though you could separate the 'doesnot' into 'does not'

This kinda change just bloats your patch (making Ted think it does more than it does -- smile):

{code}
-        LOG.warn("Failed assignment of " +
-          state.getRegion().getRegionNameAsString() + " to " +
-          plan.getDestination() + ", trying to assign elsewhere instead; " +
-          "retry=" + i, t);
+        LOG.warn("Failed assignment of "
+            + state.getRegion().getRegionNameAsString() + " to "
+            + plan.getDestination() + ", trying to assign elsewhere instead; "
+            + "retry=" + i, t);
{code}

ditto

{code}
-          LOG.warn("Unable to find a viable location to assign region " +
-            state.getRegion().getRegionNameAsString());
+          LOG.warn("Unable to find a viable location to assign region "
+              + state.getRegion().getRegionNameAsString());
{code}

Does the param 'isReAllocate' mean method was called from timeout monitor?  If so, it probably
warrants a different name?

What is going to happen if we return -1 because of below?

{code}
+      if (versionOfOfflineNode == -1) {
+        LOG.warn("Attempted to create/force node into OFFLINE state before "
+            + "completing assignment but failed to do so for " + state);
+        return -1;
       }
{code}

+1 on refactor of timeout breaking out the method actOnTimeout.

Did you change anything in way timeout monitor works?

This is interesting:

{code}
+  public static enum TimeOutOperationType {
+    ASSIGN, UNASSIGN;
+  }
{code}

We have timeout types.  How does this work?  Is this an in-memory master state only?  What
happens if master crashes?  Will new master need to know these two new states?

Why are we passing AssignmentManager a threadPoolExecutorService?  Its used by timeout monitor?
 Should creation of the executor service just live in the timeout monitor and be constructed
there rather than up in master and passed in?  Or is it that you need to be able to shut it
down when master goes down?

Add a comment on TimeOutManagerCallable class saying what its for.

This is great:

{code}
+      if (!transitionZookeeperOfflineToOpening(encodedName,
+          versionOfOfflineNode)) {
{code}

... over on the RS.

Is this code copy paste from the method above it?  The version that does not take a znode
version?

{code}
+  public RegionOpeningState openRegion(HRegionInfo region, int expectedVersion)
{code}

If so, we should fix that.

This patch needs closer review and I'd like to know how this executor service in timeout monitor
solve races we've seen (it move operations out of line w/ processing of the actual timeout
event?) but I'm thinking this patch looks good; just what the doctor ordered, at least around
passing of the version.  What are your reservations Ram?  How hard would it be to add tests?

Good stuff.



> Refactor the TimeoutMonitor to make it less racy
> ------------------------------------------------
>
>                 Key: HBASE-4015
>                 URL: https://issues.apache.org/jira/browse/HBASE-4015
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, Timeoutmonitor
with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition generator,
mostly making things worse rather than better. It does it's own thing for a while without
caring for what's happening in the rest of the master.
> The first thing that needs to happen is that the regions should not be processed in one
big batch, because that sometimes can take minutes to process (meanwhile a region that timed
out opening might have opened, then what happens is it will be reassigned by the TimeoutMonitor
generating the never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure how to do
it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message