zookeeper-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [zookeeper] mcfatealan commented on issue #1077: ZOOKEEPER-3531: Synchronization on ACLCache cause cluster to hang whe…
Date Fri, 06 Sep 2019 16:44:56 GMT
mcfatealan commented on issue #1077: ZOOKEEPER-3531: Synchronization on ACLCache cause cluster
to hang whe…
URL: https://github.com/apache/zookeeper/pull/1077#issuecomment-528928031
 
 
   Hi @lvfangmin thanks for pointing out the potential risks here!
   I totally agree that we should always be careful about potential correctness issue when
dealing with changes to synchronization. Meanwhile, in this case (plz correct me if I'm wrong
here) I think the proposed fix is rather safe that: 1) it does not break existing fixes in
ZOOKEEPER-3306 2) it does not introduce new problems.
   
   ### 1)
   The race condition mentioned in the ZOOKEEPER-3306 exists before the fix mentioned in this
thread. We carefully inspect the issue you mentioned in ZOOKEEPER-3306. And the fix introduced
in ZOOKEEPER-3306 is not to prevent this race condition to avoid the fuzzy snapshot, but to
add the missing ACL during replay. New transactions are always possible between `serializeACL`
and `serializeNodes`, and our fix would not create a new race condition here.
   
   After the fix we can still pass the test `testACLCreatedDuringFuzzySnapshotSync`.
   
   ### 2)
   Essentially the proposed fix is finer-grained synchronization. The original codes look
like:
   **v1**
   ```java
   synchronized public void serialize(OutputArchive oa) throws IOException {
       writeACL(longKeyMap);
   }
   ```
   
   which is equilvalent to: 
   **v1.b=v1**
   
   ```java
   public void serialize(OutputArchive oa) throws IOException {
       synchronized (this) {
           clonedLongKeyMap = copyACL(longKeyMap);
           writeACL(clonedLongKeyMap);
       }
   }
   ```
   
   This is how codes look like after our fix:
   **v2**
   
   ```java
   public void serialize(OutputArchive oa) throws IOException {
       synchronized (this) {
           clonedLongKeyMap = copyACL(longKeyMap);
       }
       writeACL(clonedLongKeyMap);
   }
   ```
   Because `clonedLongKeyMap` is a local, immutable data structure, it would not create new
correctness issues even we move it out of synchronization blocks. Suppose right now we have
a thread `t2` coming in when `t1` is doing `writeACL`, it is still not gonna changing the
in-memory state or on-disk records. The only difference this change made is when `writeACL`
gets stuck, it would get stuck outside the synchronization block so it would not block other
access to ACLCache.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message