lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Keil (JIRA)" <>
Subject [jira] Updated: (LUCENE-2013) QueryScorer and SpanRegexQuery are incompatible.
Date Wed, 28 Oct 2009 21:15:00 GMT


Benjamin Keil updated LUCENE-2013:

    Attachment: lucene-2013-2009-10-28.patch

Patch for LUCENE-2013

> QueryScorer and SpanRegexQuery are incompatible.
> ------------------------------------------------
>                 Key: LUCENE-2013
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.9
>         Environment: Lucene-Java 2.9
>            Reporter: Benjamin Keil
>         Attachments: lucene-2013-2009-10-28.patch
> Since the resolution of #LUCENE-1685, users are not supposed to rewrite their queries
before submitting them to QueryScorer:
> bq.{{------------------------------------------------------------------------
> r800796 | markrmiller | 2009-08-04 06:56:11 -0700 (Tue, 04 Aug 2009) | 1 line
> LUCENE-1685: The position aware SpanScorer has become the default scorer for Highlighting.
The SpanScorer implementation has replaced QueryScorer and the old term highlighting QueryScorer
has been renamed to QueryTermScorer. Multi-term queries are also now expanded by default.
If you were previously rewritting the query for multi-term query highlighting, you should
no longer do that (unless you switch to using QueryTermScorer). The SpanScorer API (now QueryScorer)
has also been improved to more closely match the API of the previous QueryScorer implementation.
> ------------------------------------------------------------------------}}
> This is a great convenience for the most part, but it's causing me difficulties with
{{SpanRegexQuery}}s, as the {{WeightedSpanTermExtractor}} uses {{Query.extractTerms()}} to
collect the fields used in the query, but {{SpanRegexQuery}} does not implement this method,
so highlighting any query with a {{SpanRegexQuery}} throws an UnsupportedOpertationException.
 If this issue is circumvented, there is still the issue of {{SpanRegexQuery}} throwing an
exception when someone calls its {{getSpans()}} method.
> I can provide the patch that I am currently using, but I'm not sure that my solution
is optimal.  It adds two methods to {{SpanQuery}}: {{extractFields(Set<String> fields)}}
which is {{fields.add(getField())}} for everything except {{MaskedFieldQuery}}, and {{mustBeRewrittenToGetSpans()}}
which returns {{true}} for {{SpanQuery}}, {{false}} for {{SpanTermQuery}}, and is overridden
in each composite {{SpanQuery}} to return a value depending on its components.  In this way
{{SpanRegexQuery}} (and any other custom {{SpanQuery}}s) do not need to be adjusted.
> Currently the collection of fields and non-weighted terms are done in a single step.
 In the proposed patch the {{WeightedSpanTerm}} extraction from a {{SpanQuery}} proceeds in
two steps.  First, if the {{QueryScorer}}'s field is {{null}}, then the fields are collected
from the {{SpanQuery}} using the {{extractFields()}} method.  Second the terms are collected
using {{extractTerms()}}, rewriting the query for each field if {{mustBeRewrittenToGetSpans()}}
returns {{true}}.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message