jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-8014) Commits carrying over from previous GC generation can block other threads from committing
Date Mon, 04 Feb 2019 16:00:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759962#comment-16759962
] 

Michael Dürig edited comment on OAK-8014 at 2/4/19 3:59 PM:
------------------------------------------------------------

[https://github.com/mduerig/jackrabbit-oak/commit/3675765e73061f470456c796b93ea6b4ee2c0cd7]
is an attempt implementing the approach outlined in my previous comment. So far the patch
contains a bare minimum of changes and lock wait time is hard coded to 1 second. Going forward
the patch needs further refinements like e.g. encapsulating the {{commitSemaphore}} into the
{{LockAdapter }}everywhere. If I got things right, the patch should not change the behaviour
of the LockBasedScheduler}} in absence of compaction completing. Also setting the lock wait
time to infinity should revert the behaviour back under all conditions.

While there are not many changes I'm not too happy with the patch:
 * Hardy testable
 * Allowing concurrent commits *should* work but we never extensively tested this.
 * Hard to reason: even after multiple hours of studying the patch I'm still not entirely
convinced that it is doing the right thing all the times without potential races conditions
or deadlocks. The main problem here is the side effects, mutable state and overlapping locks.

[~ahanikel], could you have a look?


was (Author: mduerig):
[https://github.com/mduerig/jackrabbit-oak/commit/3675765e73061f470456c796b93ea6b4ee2c0cd7]
is an attempt implementing the approach outlined in my previous comment. So far the patch
contains a bare minimum of changes and lock wait time is hard coded to 1 second. Going forward
the patch needs further refinements like e.g. encapsulating the {{commitSemaphore}} into the
{{LockAdapter }}everywhere. If I got things right, the patch should not change the behaviour
of the {{LockBasedScheduler}} in absence of compaction completing. Also setting the lock wait
time to infinity should revert the behaviour back under all conditions.

While there are not many changes I'm not too happy with the patch:
 * Hardy testable
 * Allowing concurrent commits *should* work but we never extensively tested this.
 * Hard to reason: even after multiple hours of studying the patch I'm still not entirely
convinced that it is doing the right thing all the times without potential races conditions
or deadlocks. The main problem here is the side effects, mutable state and overlapping locks.

[~ahanikel], could you have a look?

> Commits carrying over from previous GC generation can block other threads from committing
> -----------------------------------------------------------------------------------------
>
>                 Key: OAK-8014
>                 URL: https://issues.apache.org/jira/browse/OAK-8014
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>    Affects Versions: 1.10.0, 1.8.11
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Blocker
>              Labels: TarMK
>             Fix For: 1.12, 1.11.0, 1.8.12
>
>         Attachments: OAK-8014.patch
>
>
> A commit that is based on a previous (full) generation can block other commits from progressing
for a long time. This happens because such a commit will do a deep copy of its state to avoid
linking to old segments (see OAK-3348). Most of the deep copying is usually avoided by the
deduplication caches. However, in cases where the cache hit rate is not good enough we have
seen deep copy operations up to several minutes. Sometimes this deep copy operation happens
inside the commit lock of {{LockBasedScheduler.schedule()}}, which then causes all other
commits to become blocked.
> cc [~rma61870@adobe.com], [~edivad]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message