lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-4386) Query parser should generate FieldValueFilter for pure wildcard terms to boost query performance
Date Thu, 13 Sep 2012 21:13:08 GMT
Jack Krupansky created LUCENE-4386:
--------------------------------------

             Summary: Query parser should generate FieldValueFilter for pure wildcard terms
to boost query performance
                 Key: LUCENE-4386
                 URL: https://issues.apache.org/jira/browse/LUCENE-4386
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/queryparser
    Affects Versions: 4.0-BETA
            Reporter: Jack Krupansky
             Fix For: 4.0


In theory, a simple pure wildcard query (a single asterisk) is an inefficient way to select
all documents that have any value in a field. Rather than users having to work around this
issue by adding a separate boolean "has" field, it would be better to have the query parser
directly generate the most efficient Lucene query for detecting all documents that have any
value for a specified field. According to the discussion over on LUCENE-4376, the FieldValueFilter
is the proper solution.

Proposed solution:

QueryParserBase.getPrefixQuery could detect when the query is a pure wildcard (a single asterisk)
and then generate a FieldValueFilter instead of a PrefixQuery. My understanding from LUCENE-4376
is that the following would work:

{code}
new ConstantScoreQuery(new FieldValueFilter(fieldname, false))
{code}

Oh, and the check for whether "leading wildcard" is enabled would need to be bypassed for
this case.

I still think it would be better to have PrefixQuery perform this optimization internally
so that all apps would benefit, but this should be sufficient to address the main concern.

This improvement would improve the classic Lucene query parser and other query parsers based
on it, including edismax. There might be other query parsers which won't see the impact of
this change, but they can be updated separately.

How much performance benefit? Unknown, but supposedly significant. The goal is simply to have
a simple pure wildcard be the obvious tool to select fields that have a value in a field.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message