spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Montaz (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-24150) Race condition in FsHistoryProvider
Date Wed, 02 May 2018 16:57:01 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

William Montaz updated SPARK-24150:
-----------------------------------
    Description: 
There exist a race condition in checkLogs method between threads of replayExecutor. They use
the field "applications" to synchronise, but they also update that field.

The problem is that threads will eventually synchronise on different monitors (because they
will synchronise on different objects which references have been assigned to "applications"),
breaking the initial synchronisation intent. This has even greater chance to reproduce when
number_new_log_files > replayExecutor_pool_size

If such log disappears (it will not be present in the list "applications"), it will be impossible
to read it from the UI (being in the list "applications" is a mandatory check to avoid getting
a 404)

Workaround:
 * use a permanent object as a monitor on which to synchronise (or synchronise on `this`)
 * keep volatile field for all other read accesses

  was:
There exist a race condition in checkLogs method between threads of replayExecutor. They use
the field "applications" to synchronise, but they also update that field.

The problem is that threads will eventually synchronise on different monitors (because they
will synchronise on different objects which references have been assigned to "applications"),
breaking the initial synchronisation intent. This has even greater chance to reproduce when
number_new_log_files > replayExecutor_pool_size

Workaround:
 * use a permanent object as a monitor on which to synchronise (or synchronise on `this`)
 * keep volatile field for all other read accesses


> Race condition in FsHistoryProvider
> -----------------------------------
>
>                 Key: SPARK-24150
>                 URL: https://issues.apache.org/jira/browse/SPARK-24150
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: William Montaz
>            Priority: Major
>
> There exist a race condition in checkLogs method between threads of replayExecutor. They
use the field "applications" to synchronise, but they also update that field.
> The problem is that threads will eventually synchronise on different monitors (because
they will synchronise on different objects which references have been assigned to "applications"),
breaking the initial synchronisation intent. This has even greater chance to reproduce when
number_new_log_files > replayExecutor_pool_size
> If such log disappears (it will not be present in the list "applications"), it will be
impossible to read it from the UI (being in the list "applications" is a mandatory check to
avoid getting a 404)
> Workaround:
>  * use a permanent object as a monitor on which to synchronise (or synchronise on `this`)
>  * keep volatile field for all other read accesses



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message