Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B9BBFD7D8 for ; Wed, 12 Sep 2012 07:49:10 +0000 (UTC) Received: (qmail 77031 invoked by uid 500); 12 Sep 2012 07:49:09 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 76896 invoked by uid 500); 12 Sep 2012 07:49:09 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 76829 invoked by uid 99); 12 Sep 2012 07:49:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Sep 2012 07:49:07 +0000 Date: Wed, 12 Sep 2012 18:49:07 +1100 (NCT) From: "Uwe Schindler (JIRA)" To: dev@lucene.apache.org Message-ID: <290145376.68314.1347436147985.JavaMail.jiratomcat@arcas> In-Reply-To: <414664229.66509.1347405608664.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (LUCENE-4376) Add Query subclasses for selecting documents where a field is empty or not MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453811#comment-13453811 ] Uwe Schindler commented on LUCENE-4376: --------------------------------------- The filter is already there, just QueryParser does not support this. To make this work for your use case, you can override Lucene's/Solr's QueryParser to return ConstantScoreQuery() with the LUCENE-3593 filter as replacement for the "field:*" only query. The positive and negative variant works using the boolean to the filter. To conclude: The Query is already there, no need for the 2 new classes. The wanted functionality is: {code:java} new ConstantScoreQuery(new FieldValueFilter(String field, boolean negate)) {code} To find all document with any term in the field use negate=false, otherwise negate=true. There is absolutely no need for a Query. bq. Okay, so would it be straightforward and super-efficient for PrefixQuery to do exactly that if the prefix term is zero-length? Thats super-slow as it will search for all terms in the field. This is what e.g. Solr is doing currently for the "field:*" queries. Solr should use the filter, too, this would make that much more efficient. > Add Query subclasses for selecting documents where a field is empty or not > -------------------------------------------------------------------------- > > Key: LUCENE-4376 > URL: https://issues.apache.org/jira/browse/LUCENE-4376 > Project: Lucene - Core > Issue Type: Improvement > Components: core/query/scoring > Reporter: Jack Krupansky > Fix For: 5.0 > > > Users frequently wish to select documents based on whether a specified sparsely-populated field has a value or not. Lucene should provide specific Query subclasses that optimize for these two cases, rather than force users to guess what workaround might be most efficient. It is simplest for users to use a simple pure wildcard term to check for non-empty fields or a negated pure wildcard term to check for empty fields, but it has been suggested that this can be rather inefficient, especially for text fields with many terms. > 1. Add NonEmptyFieldQuery - selects all documents that have a value for the specified field. > 2. Add EmptyFieldQuery - selects all documents that do not have a value for the specified field. > The query parsers could turn a pure wildcard query (asterisk only) into a NonEmptyFieldQuery, and a negated pure wildcard query into an EmptyFieldQuery. > Alternatively, maybe PrefixQuery could detect pure wildcard and automatically "rewrite" it into NonEmptyFieldQuery. > My assumption is that if the actual values of the field are not needed, Lucene can much more efficiently simply detect whether values are present, rather than, for example, the user having to create a separate boolean "has value" field that they would query for true or false. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org