geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan José Ramos Cassella (JIRA) <j...@apache.org>
Subject [jira] [Created] (GEODE-7062) CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
Date Thu, 08 Aug 2019 15:07:00 GMT
Juan José Ramos Cassella created GEODE-7062:
-----------------------------------------------

             Summary: CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
                 Key: GEODE-7062
                 URL: https://issues.apache.org/jira/browse/GEODE-7062
             Project: Geode
          Issue Type: Bug
          Components: tests
            Reporter: Juan José Ramos Cassella


The test {{testSuspendLockingBlocksUntilNoLocks}} from class {{DistributedLockServiceDUnitTest}}
failed twice in CI runs [967|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/967]
and [969|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/969].
Results for the first failure are available [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565222926/]
and for the second one [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565246507/].
Archived artifacts for the first failure are available [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565222926/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz]
and for the second one [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565246507/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz].

The issue appears to be a race condition while firing an asynchronous thread on a remote {{VM}}
through the following code:
{code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
    VM vm1 = getVM(1);
    vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
      @Override
      public void run() {
        DistributedLockService service2 = getServiceNamed(name);
        assertThat(service2.lock("lock", -1, -1)).isTrue();
        synchronized (monitor) {
          try {
            monitor.wait();
          } catch (InterruptedException ex) {
            out.println("Unexpected InterruptedException");
            fail("interrupted");
          }
        }
        service2.unlock("lock");
      }
    });
    // Let vm1's thread get the lock and go into wait()
    sleep(100);
{code}

If the thread is not launched on the remote {{VM}} after sleeping for 100 milliseconds, the
test will fail as the thread on the local {{VM}} will be able to invoke {{suspendLocking}}
right away:
{code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
    Thread thread = new Thread(new Runnable() {
      @Override
      public void run() {
        setGot(service.suspendLocking(-1));
        setDone(true);
        service.resumeLocking();
      }
    });
    setGot(false);
    setDone(false);
    thread.start();

    // Let thread start, make sure it's blocked in suspendLocking
    sleep(100);
    assertThat(getGot() || getDone())
        .withFailMessage("Before release, got: " + getGot() + ", done: " + getDone()).isFalse();
{code}

Increasing the sleep time might help to reduce possible re occurrences of the issue, another
option would be to investigate how to make the test wait *unti* the asynchronous invocation
has been started on the remote {{VM}} instead of arbitrarily sleeping 100 milliseconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message