reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergiy Matusevych (JIRA)" <j...@apache.org>
Subject [jira] [Created] (REEF-1761) Race condition in NetworkMessagingTestService
Date Thu, 30 Mar 2017 22:46:41 GMT
Sergiy Matusevych created REEF-1761:
---------------------------------------

             Summary: Race condition in NetworkMessagingTestService
                 Key: REEF-1761
                 URL: https://issues.apache.org/jira/browse/REEF-1761
             Project: REEF
          Issue Type: Bug
          Components: REEF-IO, Wake
            Reporter: Sergiy Matusevych
            Assignee: Sergiy Matusevych
            Priority: Minor


When running unit tests with the finest level of logging, some Wake tests hang due to a race
condition in the {{NetworkMessagingTestService.MessageHandler.onNext()}} method.

That happens because the method does _two_ atomic operations separately: first, it invokes
{{AtomicInteger.incrementAndGet()}}, and later calls {{AtomicInteger.get()}} to check on the
new value. Between those two calls, the method writes some very long test message to the log.

The error rarely occurs in normal circumstances, because by default we use {{INFO}} log level
and the delay between two atomic calls is minimal. When running {{mvn -Plog}} profile (i.e.
using {{FINEST}} log level), the error happens all the time.

To fix the issue, we need to do the following:
   * Save the value returned from {{AtomicInteger.incrementAndGet()}} and use it throughout
the method;
   * Add an assertion that the message count never exceeds the expected value;
   * Write fewer data to the log - e.g. do not dump the entire content of each message



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message