hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3136) Stale reads from ZK can break the atomic CAS operations we have in ZKAssign
Date Wed, 20 Oct 2010 23:48:28 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923241#action_12923241

HBase Review Board commented on HBASE-3136:

Message from: "Jonathan Gray" <jgray@apache.org>

This is an automatically generated e-mail. To reply, visit:

(Updated 2010-10-20 16:47:05.987539)

Review request for hbase, Todd Lipcon and stack.


Last patch didn't apply for some reason, made a new one.


Adds a sync(path) operation into ZKW and three calls into it from the CAS operations in ZKAssign.

This addresses bug HBASE-3136.

Diffs (updated)

  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1025790 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1025790 

Diff: http://review.cloudera.org/r/1056/diff


Still need to test more.  I'm not sure it's possible (or feasible in a reasonable amount of
time) to make a unit test for this.  We'd probably need to dig into ZK or mock the hell out
of stuff.



> Stale reads from ZK can break the atomic CAS operations we have in ZKAssign
> ---------------------------------------------------------------------------
>                 Key: HBASE-3136
>                 URL: https://issues.apache.org/jira/browse/HBASE-3136
>             Project: HBase
>          Issue Type: Bug
>          Components: zookeeper
>    Affects Versions: 0.89.20100621, 0.89.20100924, 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.90.0
> With ZK based region transitions, we rely on atomic state changes of regions in transition.
 For example, an RS needs to atomically switch a node from OFFLINE to OPENING, or the master
needs to delete nodes that are in OPENED state, etc...
> The way we implement this is by:
> - Read existing data (returns byte[] and version in Stat)
> - Verify data is in expected state
> - Update to the new state, passing the expected version previously read
> This doesn't always work as expected because that initial read of the existing data could
be a stale read (in ZK, writes are quorum writes but reads are not so you can get stale data).
> Can provide a more explicit example if anyone is interested, but a fix is coming.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message