Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E2FEA9BC9 for ; Thu, 13 Sep 2012 21:19:08 +0000 (UTC) Received: (qmail 42937 invoked by uid 500); 13 Sep 2012 21:19:07 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 42878 invoked by uid 500); 13 Sep 2012 21:19:07 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 42870 invoked by uid 99); 13 Sep 2012 21:19:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 21:19:07 +0000 Date: Fri, 14 Sep 2012 08:19:07 +1100 (NCT) From: "Hoss Man (JIRA)" To: dev@lucene.apache.org Message-ID: <160070455.77236.1347571147533.JavaMail.jiratomcat@arcas> In-Reply-To: <1945329843.77191.1347570788104.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (LUCENE-4386) Query parser should generate FieldValueFilter for pure wildcard terms to boost query performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455303#comment-13455303 ] Hoss Man commented on LUCENE-4386: ---------------------------------- I'm confused. As Uwe allready noted in LUCENE-4376... bq. The problem is that it implicitely needs to build the FieldCache for that field, so automatism is no-go here. If you need that functionality, modify QueryParser. ...that sounds to me like a pretty clear "we can not automate this" response, because using this class requires the FieldCache, and we can't know/assume if/when the FieldCache is safe for a field. am i missing something? > Query parser should generate FieldValueFilter for pure wildcard terms to boost query performance > ------------------------------------------------------------------------------------------------ > > Key: LUCENE-4386 > URL: https://issues.apache.org/jira/browse/LUCENE-4386 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser > Affects Versions: 4.0-BETA > Reporter: Jack Krupansky > Fix For: 4.0 > > > In theory, a simple pure wildcard query (a single asterisk) is an inefficient way to select all documents that have any value in a field. Rather than users having to work around this issue by adding a separate boolean "has" field, it would be better to have the query parser directly generate the most efficient Lucene query for detecting all documents that have any value for a specified field. According to the discussion over on LUCENE-4376, the FieldValueFilter is the proper solution. > Proposed solution: > QueryParserBase.getPrefixQuery could detect when the query is a pure wildcard (a single asterisk) and then generate a FieldValueFilter instead of a PrefixQuery. My understanding from LUCENE-4376 is that the following would work: > {code} > new ConstantScoreQuery(new FieldValueFilter(fieldname, false)) > {code} > Oh, and the check for whether "leading wildcard" is enabled would need to be bypassed for this case. > I still think it would be better to have PrefixQuery perform this optimization internally so that all apps would benefit, but this should be sufficient to address the main concern. > This improvement would improve the classic Lucene query parser and other query parsers based on it, including edismax. There might be other query parsers which won't see the impact of this change, but they can be updated separately. > How much performance benefit? Unknown, but supposedly significant. The goal is simply to have a simple pure wildcard be the obvious tool to select fields that have a value in a field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org