lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ishan Chattopadhyaya (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-10159) DBQ, where query is based on updated value, reordered with the update doesn't work
Date Sun, 19 Feb 2017 01:19:44 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873393#comment-15873393
] 

Ishan Chattopadhyaya edited comment on SOLR-10159 at 2/19/17 1:19 AM:
----------------------------------------------------------------------

Finally, figured out the culprit!

BufferedUpdatesStream executes the DBQs using a searcher whose queryCache is set to null.
When the query contains a DeleteByQueryWrapper (in Solr) clause, its createWeight() method
obtains its own searcher (privateContext). This searcher's cache is not set to null, and hence
it caches the queries.

During the case of reordered DBQs, the DBQ is executed twice: first it cannot delete anything,
since the queries return 0 results, and second when it should return results. Unfortunately,
caching at this first step resulted in 0 results in the latter step (even though there is
an updated value now).

The fix is to set the queryCache for the DBQW's privateContext to whatever the initial searcher's
queryCache was set to.

Planning to commit the attached patch soon. Would be great if someone could review.


was (Author: ichattopadhyaya):
Finally, figured out the culprit!

BufferedUpdatesStream executes the DBQs using a searcher whose queryCache is set to null.
When the query contains a DeleteByQueryWrapper (in Solr) clause, its createWeight() method
obtains its own searcher (privateContext). This searcher's cache is not set to null, and hence
it caches the queries.

During the case of reordered DBQs, the DBQ is executed twice: first it cannot delete anything,
since the queries return 0 results, and second when it should return results. Unfortunately,
caching at this first step resulted in 0 results in the latter step (even though there is
an updated value now).

The fix is to set the queryCache for the DBQW's privateContext to whatever the initial searcher's
queryCache was set to.

> DBQ, where query is based on updated value, reordered with the update doesn't work
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-10159
>                 URL: https://issues.apache.org/jira/browse/SOLR-10159
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>         Attachments: SOLR-10159.patch, SOLR-10159.patch
>
>
> h2. Background/History
> If a recently updated (in-place) value is used for DBQ, the DBQ doesn't work at Lucene
level, unless there's an explicit commit between the update and the DBQ, due to LUCENE-7344.
To work around this, Yonik suggested that we use ulog.openRealtimeSearcher() just before the
DBQ is performed. This worked fine.
> Example:
> {code}
> ADD: [id=0, dv=200, title="mytitle", _version_=100]
> UPD: [id=0, dv=300, _version_=200]
> DBQ: q="dv:300", _version_=300
> {code}
> h2. Problem discovered now
> Suppose, in the above example, the last two commands are reordered at the replica. What
would happen is: \(i\) the full document (\_version\_ 100) is received and indexed, (ii) the
DBQ is received (out of ordered) and applied, and no document is deleted \[so far so good\]
and this DBQ is buffered in ulog.deleteByQueries map, (iii) the in-place update arrives (_version
200), it is applied to the document that was added in step i. After that, the buffered DBQ
is applied (at DUH2.addAndDelete()). This buffered DBQ, based on a value updated immediately
before (step ii), fails to delete the document.
> h2. What happens exactly?
> The initial DBQ query is {{"dv:300"}}, but when it is applied, it is expanded to {{"\+dv:\[300
TO 300\] -ConstantScore(frange(long(\_version\_)):\[300 TO *\])"}}. In spite of doing a ulog.openRealtimeSearcher()
just before the DBQ, it doesn't work. 
> A different version of the query, i.e. {{"\+dv:\[300 TO 300\] \+\_version\_:\[200 TO
200\]"}} also doesn't work. As I found out, *this happened due to the presence of two clauses*!
{{"\+dv:\[300 TO 300\]"}} works, and so does {{"\+\_version\_:\[200 TO 200\]"}}, but both
clauses don't work together. Also, surprisingly, even {{"\+dv:\[300 TO 300\] \+dv:\[300 TO
300\]"}} doesn't work (same clause repeated).
> h2. Investigation at Lucene level
> Upon some tedious investigation into the internals of Lucene, I discovered that if I
change the internal search (at BufferedUpdates) to use Sort.RELEVANCE instead of Sort.INDEXORDER
(which, I think is the default when using weight/scorer), the DBQ is applied correctly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message