lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4386) Query parser should generate FieldValueFilter for pure wildcard terms to boost query performance
Date Thu, 13 Sep 2012 23:14:07 GMT


Uwe Schindler commented on LUCENE-4386:

The reason for my commit is not "unsafe" or whatever. It is just, that this filter needs FieldCache
and that is a large performance impact on the first call when automatically build from QueryParser.

I am strongly against adding this to Lucene's QueryParser by default. Solr already has support
for *:* and similar, so it could use this filter in its own QueryParser impl (as replacement
for the current ConstantScore RangeQuery, which is slow.
> Query parser should generate FieldValueFilter for pure wildcard terms to boost query
> ------------------------------------------------------------------------------------------------
>                 Key: LUCENE-4386
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser
>    Affects Versions: 4.0-BETA
>            Reporter: Jack Krupansky
>             Fix For: 4.0
> In theory, a simple pure wildcard query (a single asterisk) is an inefficient way to
select all documents that have any value in a field. Rather than users having to work around
this issue by adding a separate boolean "has" field, it would be better to have the query
parser directly generate the most efficient Lucene query for detecting all documents that
have any value for a specified field. According to the discussion over on LUCENE-4376, the
FieldValueFilter is the proper solution.
> Proposed solution:
> QueryParserBase.getPrefixQuery could detect when the query is a pure wildcard (a single
asterisk) and then generate a FieldValueFilter instead of a PrefixQuery. My understanding
from LUCENE-4376 is that the following would work:
> {code}
> new ConstantScoreQuery(new FieldValueFilter(fieldname, false))
> {code}
> Oh, and the check for whether "leading wildcard" is enabled would need to be bypassed
for this case.
> I still think it would be better to have PrefixQuery perform this optimization internally
so that all apps would benefit, but this should be sufficient to address the main concern.
> This improvement would improve the classic Lucene query parser and other query parsers
based on it, including edismax. There might be other query parsers which won't see the impact
of this change, but they can be updated separately.
> How much performance benefit? Unknown, but supposedly significant. The goal is simply
to have a simple pure wildcard be the obvious tool to select fields that have a value in a

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message