lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benson Margulies (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-1999) Match spotter for all query types
Date Sat, 14 Apr 2012 18:03:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254163#comment-13254163
] 

Benson Margulies commented on LUCENE-1999:
------------------------------------------

I have a potential application for this, and would be willing to work on it, assuming that
committers have any interest in committing the results.

Let me explain my particular case, which some of you may have seen discussed on solr-users.


Imagine wanted to search for documents based on some relatively expensive similarity metric.
Too expensive, by far, to want to run on every single document in the index, or even all the
documents that pass some filter first.

Further imagine that you come up with an approximation of the similarity metric in terms of
Lucene query capabilities. The approximation is ordinary (e.g. no Solr Functions forcing a
computation on each document), and approximates by having the same (or higher) recall than
the real metric, but lower precision. Roughly, that the top 200 hits based on the approximation
will contain the top 10 hits based on the real metric.

OK, well, then, you can run this query, retrive documents, select the top hits, and then run
the real metric. You get the right answer for far lower CPU time.

And all of this works perfectly fine with Lucene (and Solr) as we know it. However, imagine
a further challenge. You want to combine the approximation query with arbitrary other query
terms -- and then fix up the scores in the top documents to reflect the real metric.

Well, you can run a second query on just the approximation query to get its score contribution,
subtract it out, and add in (scaling here is a challenge) the results of the real metric.

Or, it seems to me, you could use this approach here, as perhaps extended as discussed.

?
 
                
> Match spotter for all query types
> ---------------------------------
>
>                 Key: LUCENE-1999
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1999
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 2.9
>            Reporter: Mark Harwood
>         Attachments: matchflagger.patch
>
>
> Related to LUCENE-1929 and the current inability to highlight NumericRangeQuery, spatial,
cached term filters and other exotica.
> This patch provides the ability to wrap *any* Query objects and record match info as
flags encoded in the overall document score.
> Using this approach it would be possible to understand (and therefore highlight) which
fields matched clauses in a query.
> The match encoding approach loses some precision in scores as noted here: http://tinyurl.com/ykt8nx7
> Avoiding these precision issues would require a change to Lucene core to record docId,
score AND a matchFlag byte in ScoreDoc objects and collector APIs.
> This may be something we should consider.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message