jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-7867) Flush thread gets stuck when input stream of binaries block
Date Mon, 29 Oct 2018 15:19:05 GMT

    [ https://issues.apache.org/jira/browse/OAK-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667343#comment-16667343
] 

Michael Dürig commented on OAK-7867:
------------------------------------

* Re. *TimeOut*: deciding on a useful time is tricky. The type of input stream is what determines
that value but is in the user's business domain. OTOH the resilience characteristics of the
segment store rely on that value. Implementation wise we would need to loop reading from the
user supplied input stream through a second thread so we are able to detect a timeout and
to take action.
 * Re. *Flush on commit*: this looks like a clean solution. However it changes the current
design which decouples commit boundaries from writing (and writing ahead) the segments. At
this point of time it is not clear how this would impact performance and we would need to
understand whether and how much small segments would be created by such an approach.
 * Re. *Replace {{SegmentWriter.writeStream()}}*: such a solution only works if we rely on
all {{Blob}} instances passed into the segment store to behave properly. This is currently
not the case (see e.g. {{BinaryBasedBlob}}). While we could fix this instance we would still
rely on Oak API consumers to behave properly. Also "fixing" {{BinaryBasedBlob}} might have
an effect on other {{NodeStore}} instances.
 * Re. *Reduce lock granularity*: such a solution would require changes to low level code
and would need rigorous testing. From an initial patch I'm working on it seems however that
then changes are limited to a few classes and in some cases even lead to a unification in
the code.

> Flush thread gets stuck when input stream of binaries block
> -----------------------------------------------------------
>
>                 Key: OAK-7867
>                 URL: https://issues.apache.org/jira/browse/OAK-7867
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Critical
>              Labels: candidate_oak_1_6, candidate_oak_1_8
>             Fix For: 1.10
>
>
> This issue tackles the root cause of the sever data loss that has been reported in OAK-7852:
> When a the input stream of a binary value blocks indefinitely on read the flush thread
of the segment store get blocked:
> {noformat}
> "pool-2-thread-1" #15 prio=5 os_prio=31 tid=0x00007fb0f21e3000 nid=0x5f03 waiting on
condition [0x000070000a46d000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000000076bba62b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at com.google.common.util.concurrent.Monitor.await(Monitor.java:963)
> at com.google.common.util.concurrent.Monitor.enterWhen(Monitor.java:402)
> at org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.safeEnterWhen(SegmentBufferWriterPool.java:179)
> at org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.flush(SegmentBufferWriterPool.java:138)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter.flush(DefaultSegmentWriter.java:138)
> at org.apache.jackrabbit.oak.segment.file.FileStore.lambda$doFlush$8(FileStore.java:307)
> at org.apache.jackrabbit.oak.segment.file.FileStore$$Lambda$22/1345968304.flush(Unknown
Source)
> at org.apache.jackrabbit.oak.segment.file.TarRevisions.doFlush(TarRevisions.java:237)
> at org.apache.jackrabbit.oak.segment.file.TarRevisions.flush(TarRevisions.java:195)
> at org.apache.jackrabbit.oak.segment.file.FileStore.doFlush(FileStore.java:306)
> at org.apache.jackrabbit.oak.segment.file.FileStore.flush(FileStore.java:318)
> {noformat}
> The condition {{0x000070000a46d000}} is waiting for the following thread to return its
{{SegmentBufferWriter}}, which will never happen if {{InputStream.read(...)}} does not progress.
> {noformat}
> "pool-1-thread-1" #14 prio=5 os_prio=31 tid=0x00007fb0f223a800 nid=0x5d03 runnable [0x000070000a369000
> ] java.lang.Thread.State: RUNNABLE
> at com.google.common.io.ByteStreams.read(ByteStreams.java:833)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.internalWriteStream(DefaultSegmentWriter.java:641)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeStream(DefaultSegmentWriter.java:618)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeBlob(DefaultSegmentWriter.java:577)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeProperty(DefaultSegmentWriter.java:691)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeProperty(DefaultSegmentWriter.java:677)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeNodeUncached(DefaultSegmentWriter.java:900)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeNode(DefaultSegmentWriter.java:799)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.access$800(DefaultSegmentWriter.java:252)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$8.execute(DefaultSegmentWriter.java:240)
> at org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.execute(SegmentBufferWriterPool.java:105)
> at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter.writeNode(DefaultSegmentWriter.java:235)
> at org.apache.jackrabbit.oak.segment.SegmentWriter.writeNode(SegmentWriter.java:79)
> {noformat}
>  
> This issue is critical as such a misbehaving input stream causes the flush thread to
get stuck preventing transient segments from being flushed and thus causing data loss.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message