lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9470) Deadlocked threads in recovery
Date Mon, 12 Sep 2016 04:35:20 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483039#comment-15483039
] 

David Smiley commented on SOLR-9470:
------------------------------------

Nice analysis -- a lock ordering problem.  I don't have a lot of familiarity with this internal
aspect of Solr, but I have more faith in the code path of SolrCore.getSearcher() to get locks
in the right order (as it's hammered all the time) than that of IndexFetcher/Replication.
 That getSearcher first obtains the openSearcher lock and then the indexWriter lock makes
sense to me.   

I don't follow something you said:  You explained how IndexFetcher (line 520) grabs the iwLock
by calling DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java 210).  I see this.
However, that method promptly releases the lock.  Granted during all this I think openSearcher
should be held and it doesn't seem to be but despite that, the stack trace, to me, doesn't
show that to be a problem in this instance.  I do see that the iwLock is held (by cross-referencing
the memory reference with that of another thread awaiting it)... but it's not evident to me
where exactly iwLock is acquired _such that it isn't released at the time IndexFetcher line
523 is reached_.

> Deadlocked threads in recovery
> ------------------------------
>
>                 Key: SOLR-9470
>                 URL: https://issues.apache.org/jira/browse/SOLR-9470
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.2
>            Reporter: Michael Braun
>         Attachments: solr-deadlock.txt
>
>
> Background: Booted up a cluster and replicas were in recovery. All replicas recovered
minus one, and it was hanging on HTTP requests. Issued shutdown and solr would not shut down.
Examined with JStack and found a deadlock had occurred. The relevant thread information is
attached. Some information has been redacted as well (some custom URPs, IPs) from the stack
traces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message