hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master
Date Wed, 27 Oct 2010 23:36:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925605#action_12925605
] 

Jonathan Gray commented on HBASE-3159:
--------------------------------------

Stack, when you rerun your tests again, turn off the ZK client logging and ensure all of our
ZK logging is set to pickup DEBUG level.  There's always a slight chance we would want the
raw ZK client logs, if something really crazy is happening, but there should be enough logging
in our ZKW and ZKUtil as long as we pick up debug.

One thing though, change the method at the bottom of ZKUtil to the following:

{noformat}
  private static void logRetrievedMsg(final ZooKeeperWatcher zkw,
      final String znode, final byte [] data, final boolean watcherSet) {
    if (!LOG.isDebugEnabled()) return;
    LOG.debug(zkw.prefix("Retrieved " + ((data == null)? 0: data.length) +
      " byte(s) of data from znode " + znode +
      (watcherSet? " and set watcher; ": "; data=") +
      (data == null? "null": (
          znode.startsWith(zkw.assignmentZNode) ?
              RegionTransitionData.fromBytes(data).toString()
              : StringUtils.abbreviate(Bytes.toString(data), 32)))));
  }
{noformat}

The change is that we detect if we're logging an unassigned znode, and if so, we print the
region transition data.  This will make debugging this much simpler.

> Double play of OpenedRegionHandler for a single region; fails second time through and
aborts Master
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3159
>                 URL: https://issues.apache.org/jira/browse/HBASE-3159
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>         Attachments: hbase-meta-dupe-opened-master-only.txt, hbase-meta-dupe-opened.txt
>
>
> Here is master log with annotations: http://people.apache.org/~stack/master.txt
> Region in question is:
> b8827a67a9d446f345095d25e1f375f7
> The running code is doctored in that I've added in a bit of logging -- zk in particular
-- and I've also removed what I thought was a provocation of this condition, reassign inside
in an assign if server has gone away when we try the open rpc (Turns out we have the condition
even w/o this code in place).
> The log starts where the region in question timesout in RIT.
> We assign it to 186.
> Notice how we see 'Handling transition' for this region TWICE.  This means two OpenedRegionHandlers
will be scheduled -- and so the failure to delete a znode already gone.
> As best I can tell, the watcher for this region is triggered once only -- which is odd
because how then the double scheduling of OpenedRegionHandler but also, why am I not seeing
OPENING, OPENING, OPENED and only what I presume is an OPENED?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message