reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergiy Matusevych (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (REEF-1761) Race condition in NetworkMessagingTestService
Date Thu, 30 Mar 2017 22:48:42 GMT

     [ https://issues.apache.org/jira/browse/REEF-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergiy Matusevych updated REEF-1761:
------------------------------------
    Description: 
When running unit tests with the finest level of logging, some {{reef-io}} tests hang due
to a race condition in the {{NetworkMessagingTestService.MessageHandler.onNext()}} method.

That happens because the method does _two_ atomic operations separately: first, it invokes
{{AtomicInteger.incrementAndGet()}}, and later calls {{AtomicInteger.get()}} to check on the
new value. Between those two calls, the method writes some very long test message to the log.

The error rarely occurs in normal circumstances, because by default we use {{INFO}} log level
and the delay between two atomic calls is minimal. When running {{mvn -Plog}} profile (i.e.
using {{FINEST}} log level), the error happens all the time.

To fix the issue, we need to do the following:
   * Save the value returned from {{AtomicInteger.incrementAndGet()}} and use it throughout
the method;
   * Add an assertion that the message count never exceeds the expected value;
   * Write fewer data to the log - e.g. do not dump the entire content of each message

  was:
When running unit tests with the finest level of logging, some Wake tests hang due to a race
condition in the {{NetworkMessagingTestService.MessageHandler.onNext()}} method.

That happens because the method does _two_ atomic operations separately: first, it invokes
{{AtomicInteger.incrementAndGet()}}, and later calls {{AtomicInteger.get()}} to check on the
new value. Between those two calls, the method writes some very long test message to the log.

The error rarely occurs in normal circumstances, because by default we use {{INFO}} log level
and the delay between two atomic calls is minimal. When running {{mvn -Plog}} profile (i.e.
using {{FINEST}} log level), the error happens all the time.

To fix the issue, we need to do the following:
   * Save the value returned from {{AtomicInteger.incrementAndGet()}} and use it throughout
the method;
   * Add an assertion that the message count never exceeds the expected value;
   * Write fewer data to the log - e.g. do not dump the entire content of each message


> Race condition in NetworkMessagingTestService
> ---------------------------------------------
>
>                 Key: REEF-1761
>                 URL: https://issues.apache.org/jira/browse/REEF-1761
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF-IO, Wake
>            Reporter: Sergiy Matusevych
>            Assignee: Sergiy Matusevych
>            Priority: Minor
>              Labels: logging, network, race-condition
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When running unit tests with the finest level of logging, some {{reef-io}} tests hang
due to a race condition in the {{NetworkMessagingTestService.MessageHandler.onNext()}} method.
> That happens because the method does _two_ atomic operations separately: first, it invokes
{{AtomicInteger.incrementAndGet()}}, and later calls {{AtomicInteger.get()}} to check on the
new value. Between those two calls, the method writes some very long test message to the log.
> The error rarely occurs in normal circumstances, because by default we use {{INFO}} log
level and the delay between two atomic calls is minimal. When running {{mvn -Plog}} profile
(i.e. using {{FINEST}} log level), the error happens all the time.
> To fix the issue, we need to do the following:
>    * Save the value returned from {{AtomicInteger.incrementAndGet()}} and use it throughout
the method;
>    * Add an assertion that the message count never exceeds the expected value;
>    * Write fewer data to the log - e.g. do not dump the entire content of each message



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message