curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Zimmerman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-527) Concurrency issue in LockInternals
Date Wed, 05 Jun 2019 17:06:00 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856870#comment-16856870
] 

Jordan Zimmerman commented on CURATOR-527:
------------------------------------------

FYI - just in case I wrote a quick test to verify. I added these fields to LockInternals:

{code}
    static volatile CountDownLatch debugAfterGetDataLatchWaiter = null;
    static volatile CountDownLatch debugAfterGetDataLatchReady = null;
{code}

Then, in LockInternals .internalLockLoop(), I added this:

{code}
                            client.getData().usingWatcher(watcher).forPath(previousSequencePath);

                            if ( debugAfterGetDataLatchWaiter != null )
                            {
                                debugAfterGetDataLatchReady.countDown();
                                debugAfterGetDataLatchWaiter.await();
                            }
{code}

This has the effect of (when the debug latches are set) {{debugAfterGetDataLatchReady}} will
count down after getData is called and the block on {{debugAfterGetDataLatchWaiter}}. 

With this in place, I wrote this test in TestInterProcessMutexBase:

{code}
    @Test
    public void testDeletionBetweenGetDataAndWait() throws Exception {
        final Timing timing = new Timing();
        final CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(),
timing.session(), timing.connection(), new ExponentialBackoffRetry(100, 3));
        try
        {
            client.start();

            InterProcessLock lock1 = makeLock(client);
            InterProcessLock lock2 = makeLock(client);

            CountDownLatch lock1latchReady = new CountDownLatch(1);
            CountDownLatch lock1latchWait = new CountDownLatch(1);
            CompletableFuture.runAsync((() -> {
                try
                {
                    lock1.acquire();
                    System.out.println("Lock1 acquired - waiting");
                    lock1latchReady.countDown();
                    lock1latchWait.await();
                }
                catch ( Exception e )
                {
                    throw new RuntimeException(e);
                }
                finally
                {
                    try
                    {
                        lock1.release();
                        System.out.println("Lock1 released");
                    }
                    catch ( Exception e )
                    {
                        // ignore
                    }
                }
            }));

            timing.awaitLatch(lock1latchReady);

            LockInternals.debugAfterGetDataLatchReady = new CountDownLatch(1);
            LockInternals.debugAfterGetDataLatchWaiter = new CountDownLatch(1);

            CompletableFuture<Void> lock2Future = CompletableFuture.runAsync((() ->
{
                try
                {
                    System.out.println("Pre lock2 acquired");
                    lock2.acquire();
                    System.out.println("Post lock2 acquired");
                }
                catch ( Exception e )
                {
                    throw new RuntimeException(e);
                }
            }));

            System.out.println("Wait a bit 1");
            timing.sleepABit();
            System.out.println("Tell lock 1 to release");
            lock1latchWait.countDown();
            System.out.println("Wait a bit 2");
            timing.sleepABit();
            System.out.println("debugAfterGetDataLatchWaiter count down");
            LockInternals.debugAfterGetDataLatchWaiter.countDown();

            try
            {
                lock2Future.get(timing.forWaiting().milliseconds(), TimeUnit.MILLISECONDS);
            }
            catch ( Exception e )
            {
                Assert.fail("lock2 did not acquire");
            }
        }
        finally
        {
            CloseableUtils.closeQuietly(client);
        }
    }
{code}

It creates a lock in a thread, holds that lock and waits for it to acquire. Then, it sets
the debug latches, and tries to acquire a second lock on the same path. It waits for the {{debugAfterGetDataLatchReady}}
to be ready thus ensuring the second lock's thread is blocked after getData and before wait().
It then signals the lock1 thread to release the lock. After waiting a bit, it counts down
the debug latch so that lock2 can proceed. Due to the watcher being called, as soon as lock2's
thread waits, the watcher will notify it and it will return, thus continuing the lock process.


The test succeeds.

> Concurrency issue in LockInternals
> ----------------------------------
>
>                 Key: CURATOR-527
>                 URL: https://issues.apache.org/jira/browse/CURATOR-527
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.12.0
>         Environment: Curator 2.12.0
> zookeeper 3.4.14
>            Reporter: Kim Jaechang
>            Priority: Major
>
> I'm using InterProcessMutex and InterProcessMutex often failed to acquire lock.
> In LockInternals.internalLockLoop(), watcher is registered to zookeeper and call wait()
like below
> {code:java}
> client.getData().usingWatcher(watcher).forPath(previousSequencePath);
> if ( millisToWait != null )
> {
>     millisToWait -= (System.currentTimeMillis() - startMillis);
>     startMillis = System.currentTimeMillis();
>     if ( millisToWait <= 0 )
>     {
>         doDelete = true;    // timed out - delete our node
>         break;
>     }
>     wait(millisToWait);
> }
> else
> {
>     wait();
> }
> {code}
> In my case, my program is waiting previousSequencePath=_c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759
to be deleted.
> But _c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 is deleted between client.getData()
and wait().
> if _c_f290140d-9856-42ad-b9bf-348ffc086062-lock-0000000759 is deleted when 
> client.getData().usingWatcher(watcher).forPath(previousSequencePath) is called, it will
throw Exception but it was exist at that time.
> I'm using Curator 2.12.0 but latest version seems to have same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message