curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Pattiz (JIRA)" <>
Subject [jira] [Comment Edited] (CURATOR-318) Threads may return different boolean values when entering same double barrier
Date Sat, 14 Jul 2018 06:55:00 GMT


Josh Pattiz edited comment on CURATOR-318 at 7/14/18 6:54 AM:

I added a test for the problem. I've done a PR of a simple fix, which _mostly_ resolved
the problem. I believe there are still potential race conditions, but they are dramatically
reduced (before they were basically infinite, ie once double barrier entrance timed out for
any client the barrier was essentially broken).

was (Author: htuy):
I added a test for the problem. I can do a PR of a simple fix of just deleting a client's
entry into the barrier if it timeouts trying to enter, but that's not really a perfect fix
and I think a perfect fix would probably require a decent amount of reworking on how the barrier
functions. I think that fix is probably pretty reasonable in most cases though. Right now
the double barrier basically enters a broken state if any client times out trying to enter.

> Threads may return different boolean values when entering same double barrier
> -----------------------------------------------------------------------------
>                 Key: CURATOR-318
>                 URL:
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.10.0
>            Reporter: Shiliang Cao
>            Priority: Major
>         Attachments:,
> To my understanding, when all threads are trying enter an barrier, they should all success
or fail, which means their return values should be the same.
> But actually they may get different return values in this situation (reproduce steps):
> 0. Some preparing works such as running a zk server, basic curator connecting codes;
> 1. Prepare 3 threads: thread1/ thread2/ thread3;
> 2. Thread1 sleep 20 seconds then enter barrier, thread2 and thread3 try to enter barrier
right now, with timeout value set to 5 seconds;
> 3. Result: thread2 and thread3 returned false due to timeout as expected, but thread1
(the sleeping one) just return true, which I think should be false too.
> Possible root cause as I observed via zkCli:
> When thread1 and thread2 enter methods returned, their path nodes remained, so when thread3
came, it just think other threads are still waiting, so it just created the ready node and
return with true.
> If this is not by design, it should be a design defect.

This message was sent by Atlassian JIRA

View raw message