lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
Date Wed, 25 Jun 2008 20:50:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608183#action_12608183
] 

Hoss Man commented on LUCENE-1316:
----------------------------------

bq. if thread A deleted a document, and then thread B checked if it was deleted, thread B
was guaranteed to see that it was in fact deleted.

Hmmm.... i'll take your word for it, but i don't follow the rational: the current synchronization
just ensured that either the isDeleted() call will complete before the delete() call started
or vice versa -- but you have no guarantee that thread B would run after thread A and get
true.   .... unless... is your point that without synchronization on the null check there's
no garuntee that B will ever see the change to deletedDocs even if it does execute after delete()
?

either way: robert's point about hasDeletions() needing to be synchronized seems like a bigger
issue -- isn't that a bug in the current implementation?  assuming we fix that then it seems
like the original issue is back to square one: synchro bottlenecks when there are no deletions.





> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential
synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher
level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in
the stack you see the MatchAllScorer.next() method. In Solr paricularly, this scorer is used
for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests
for NOT queries, due to this bottleneck. The problem is that every single document is run
through this isDeleted() method, which is synchronized. Having an optimized index exacerbates
this issues, as there is only a single SegmentReader to synchronize on, causing a major thread
pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader,
much of this can be avoided. Especially in a read-only environment for production where you
have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.
 We also got the same query results. I don't believe this will improve the situation for indexes
that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message