lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5002) Deadlock in DocumentsWriterFlushControl
Date Fri, 17 May 2013 11:43:17 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660586#comment-13660586
] 

Simon Willnauer commented on LUCENE-5002:
-----------------------------------------

Ok so I tried to make this work for an entire day and bottom line is that once I move the
DocumentsWriter#abort() out of the sync block my test still fails all over the place. Yet,
it's not hanging but concurrent access to IW while IW#deleteAll() is called is entirely broken
IMO. I don't even know where to start, here is a small wrapup of the failures I saw:
 - asserts are tripped in global field map since we clear and concurrently index (remember
indexing is non-blocking)
 - concurrent commits fail with fiel not found exception (even if we fully sync) seems like
some state in IW is not cleared
 - updatePendingMerges fails with FNF when merges are updated concurrently.

To begin with I doubt that the semantics of IW#deleteAll() are correct today if you are accessing
the IW concurrently. I mean we basically dropping everything and don't maintain any happens
before relationship here at all, delete all files that are not referenced in any seg info
wipe all the global field infos etc. We should address this properly.

I agree that we have to fix this until 4.3.1!

Yet, Serguiuz  do you see any FileNotFoundExceptions or anything when you concurrently call
deleteAll? I mean this seems entirely broken to me at this point. I suggest you to use deleteQuery(new
MatchAllDocsQuery()) for now and not lock globally. 

simon
                
> Deadlock in DocumentsWriterFlushControl
> ---------------------------------------
>
>                 Key: LUCENE-5002
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5002
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.3
>         Environment: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode)
> Linux Ubuntu Server 12.04 LTS 64-Bit
>            Reporter: Sergiusz Urbaniak
>            Assignee: Simon Willnauer
>             Fix For: 5.0, 4.4, 4.3.1
>
>         Attachments: LUCENE-5002_test.patch
>
>
> Hi all,
> We have an obvious deadlock between a "MaybeRefreshIndexJob" thread
> calling ReferenceManager.maybeRefresh(ReferenceManager.java:204) and a
> "RebuildIndexJob" thread calling
> IndexWriter.deleteAll(IndexWriter.java:2065).
> Lucene wants to flush in the "MaybeRefreshIndexJob" thread trying to intrinsically lock
the IndexWriter instance at {{DocumentsWriterPerThread.java:563}} before notifyAll()ing the
flush. 
> Simultaneously the "RebuildIndexJob" thread who already intrinsically locked the IndexWriter
instance at IndexWriter#deleteAll wait()s at {{DocumentsWriterFlushControl.java:245}} for
the flush forever causing a deadlock.
> {code}
> "MaybeRefreshIndexJob Thread - 2" daemon prio=10 tid=0x00007f8fe4006000 nid=0x1ac2 waiting
for monitor entry [0x00007f8fa7bf7000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.lucene.index.IndexWriter.useCompoundFile(IndexWriter.java:2223)
> 	- waiting to lock <0x00000000f1c00438> (a org.apache.lucene.index.IndexWriter)
> 	at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:563)
> 	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:533)
> 	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
> 	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
> 	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:365)
> 	- locked <0x00000000f1c007d0> (a java.lang.Object)
> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245)
> 	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
> 	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
> 	at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
> 	at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
> 	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:155)
> 	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:204)
> 	at jobs.MaybeRefreshIndexJob.timeout(MaybeRefreshIndexJob.java:47)
> "RebuildIndexJob Thread - 1" prio=10 tid=0x00007f903000a000 nid=0x1a38 in Object.wait()
[0x00007f9037dd6000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000f1c0c240> (a org.apache.lucene.index.DocumentsWriterFlushControl)
> 	at java.lang.Object.wait(Object.java:503)
> 	at org.apache.lucene.index.DocumentsWriterFlushControl.waitForFlush(DocumentsWriterFlushControl.java:245)
> 	- locked <0x00000000f1c0c240> (a org.apache.lucene.index.DocumentsWriterFlushControl)
> 	at org.apache.lucene.index.DocumentsWriter.abort(DocumentsWriter.java:235)
> 	- locked <0x00000000f1c05370> (a org.apache.lucene.index.DocumentsWriter)
> 	at org.apache.lucene.index.IndexWriter.deleteAll(IndexWriter.java:2065)
> 	- locked <0x00000000f1c00438> (a org.apache.lucene.index.IndexWriter)
> 	at jobs.RebuildIndexJob.buildIndex(RebuildIndexJob.java:102)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message