hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master
Date Thu, 28 Oct 2010 19:14:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925920#action_12925920
] 

HBase Review Board commented on HBASE-3159:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1108/
-----------------------------------------------------------

Review request for hbase and Jonathan Gray.


Summary
-------

Here is patch I've been testing with up on cluster.  Adds debugging and two fixes -- one setting
state to OPEN on receipt of a rs opened event and two, resetting a watcher getting data over
in zkutil (as per its documentation claims).


This addresses bug hbase-3159.
    http://issues.apache.org/jira/browse/hbase-3159


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 30e49c8 
  src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 1a88700 
  src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java d4fa82b 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 81661ef 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 21a4256 

Diff: http://review.cloudera.org/r/1108/diff


Testing
-------

Running on cluster


Thanks,

stack




> Double play of OpenedRegionHandler for a single region; fails second time through and
aborts Master
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3159
>                 URL: https://issues.apache.org/jira/browse/HBASE-3159
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>         Attachments: hbase-meta-dupe-opened-master-only.txt, hbase-meta-dupe-opened.txt,
master-root-assign-abort.log, rs_death_on_meta_open_no_root.txt, TestRollingRestart-v4.patch
>
>
> Here is master log with annotations: http://people.apache.org/~stack/master.txt
> Region in question is:
> b8827a67a9d446f345095d25e1f375f7
> The running code is doctored in that I've added in a bit of logging -- zk in particular
-- and I've also removed what I thought was a provocation of this condition, reassign inside
in an assign if server has gone away when we try the open rpc (Turns out we have the condition
even w/o this code in place).
> The log starts where the region in question timesout in RIT.
> We assign it to 186.
> Notice how we see 'Handling transition' for this region TWICE.  This means two OpenedRegionHandlers
will be scheduled -- and so the failure to delete a znode already gone.
> As best I can tell, the watcher for this region is triggered once only -- which is odd
because how then the double scheduling of OpenedRegionHandler but also, why am I not seeing
OPENING, OPENING, OPENED and only what I presume is an OPENED?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message