jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6884) TarMK disk space check is not synchronized with FileStore opened state
Date Fri, 01 Dec 2017 14:10:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274443#comment-16274443
] 

Francesco Mari commented on OAK-6884:
-------------------------------------

[~mduerig], I don't think that's sufficient. {{ShutDown#shutDown}} might be called from the
main thread after the check on {{ShutDown#shutDownRequested}} in the disk space check thread,
but before the call to {{FileStore#size}}. 

To implement a safer solution it's necessary to create another synchronization primitive in
{{ShutDown}} that allows us to do the following in the disk space thread.

{noformat}
try (ShutDownCloser ignored = shutDown.tryKeepAlive()) {
    if (shutDown.shutDownRequested()) {
        return;
    }
    checkDiskSpace(gcOptions);
}
(noformat}

The implementation of {{tryKeepAlive}} (naming could be improved) is equivalent to {{keepAlive}},
except for the lack of the check on {{shutdownRequested}} and the {{IllegalStateException}}.
{{tryKeepAlive}} is necessary to properly synchronize with {{ShutDown#shutDown}}. Callers
of {{tryKeepAlive}} must always check {{ShutDown#shutDownRequested}} as soon as they enter
in the try-with-resource block, and exit if the check is positive. There are only two possible
reasons why the check could be positive. First, a user asked the {{FileStore}} to shut down,
but nothing has been closed yet. Second, the {{FileStore}} has already been shut down. Whatever
the reason, it is necessary to bail out either for responsiveness or for correctness.

By the way, while looking at the code I figured out that the flush thread suffers from the
same race condition.

> TarMK disk space check is not synchronized with FileStore opened state
> ----------------------------------------------------------------------
>
>                 Key: OAK-6884
>                 URL: https://issues.apache.org/jira/browse/OAK-6884
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, upgrade
>    Affects Versions: 1.7.9
>            Reporter: Arek Kita
>            Assignee: Michael Dürig
>            Priority: Minor
>              Labels: concurrency, production
>             Fix For: 1.8, 1.7.14
>
>
> It seems that the disk space check is not properly synchronized with {{FileStore}} as
I revealed a race condition while using oak-upgrade during migration to {{segment-tar}}.
> The {{FileStore}} instance is closed while TarMK disk check tries to execute and it seems
it is dependent on the state of segment ({{org.apache.jackrabbit.oak.segment.file.FileStore.checkDiskSpace(FileStore.java:541)}}
that needs to be opened. 
> {noformat}
> 30.10.2017 11:26:05.834 WARN   o.a.j.o.s.f.Scheduler: The scheduler FileStore background
tasks takes too long to shut down
> 30.10.2017 11:26:11.674 INFO   o.a.j.o.s.f.FileStore: TarMK closed: /data/cq/crx-quickstart/repository-segment-tar-20171030-112401/segmentstore
> 30.10.2017 11:26:11.676 ERROR  o.a.j.o.s.f.SafeRunnable: Uncaught exception in TarMK
disk space check [/data/cq/crx-quickstart/repository-segment-tar-20171030-112401/segmentstore]
> java.lang.IllegalStateException: already shut down
>     at org.apache.jackrabbit.oak.segment.file.ShutDown.keepAlive(ShutDown.java:42)
>     at org.apache.jackrabbit.oak.segment.file.FileStore.size(FileStore.java:302)
>     at org.apache.jackrabbit.oak.segment.file.FileStore.checkDiskSpace(FileStore.java:541)
>     at org.apache.jackrabbit.oak.segment.file.FileStore.access$300(FileStore.java:102)
>     at org.apache.jackrabbit.oak.segment.file.FileStore$3.run(FileStore.java:237)
>     at org.apache.jackrabbit.oak.segment.file.SafeRunnable.run(SafeRunnable.java:67)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message