zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andor Molnar (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ZOOKEEPER-3162) Broken lock semantics in C client lock-recipe
Date Mon, 12 Nov 2018 22:22:00 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andor Molnar resolved ZOOKEEPER-3162.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 3.5.5
                   3.4.14
                   3.6.0

Issue resolved by pull request 699
[https://github.com/apache/zookeeper/pull/699]

> Broken lock semantics in C client lock-recipe
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-3162
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3162
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.0.0, 3.4.13
>            Reporter: Andrea Reale
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.6.0, 3.4.14, 3.5.5
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> As reported (but never fixed) in the past by ZOOKEEPER-2409, ZOOKEEPER-2038 and (partly)
ZOOKEEPER-2878, the C client lock-recipe implementation is broken.
> I identified three issues.
> The main one (as also reported in the aforementioned reports) is that the logic that
goes through the lock waiting list is broken. child_floor uses strcmp and compares the full
node name (i.e., sessionID-sequence) rather than only comparing the sequence number. This
makes it possible for two different clients to hold the lock at the same time: assume two
clients, one associated with session A, the other with session B, with A < B lexicographically.
Now assume that at some point a thread in B holds a lock and a thread in A tries to acquire
the same lock. A will manage to get the lock because of the wrong comparison function, so
now two guys hold the lock.
> The second issue is a possible deadlock inside zkr_lock_operation. zkr_lock_operation
is always called by holding the mutex associated to the client lock. In some cases, zkr_lock_operaton
may decide to give-up locking and call zkr_lock_unlock to release the lock. When this happens,
it will try to acquire again the same phtread mutex, which will lead to a deadlock.
> The third issue relates to the return value of zkr_lock_lock. According to the API docs,
the functions returns 0 when no errors. Then it is up to the invoker to check when the lock
is held by calling zkr_lock_isowner. However, the implementation, in case of no error, returns
zkr_lock_isowner. This is wrong because it becomes impossible to distinguish an error condition
from a success (but not ownerhsip). Instead the API (as described in the docs, btw) should
return always 0 when no errors occur.
> Shortly I will add the link to a PR fixing the issues.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message