Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E6A092BD for ; Thu, 16 Feb 2012 21:44:02 +0000 (UTC) Received: (qmail 76046 invoked by uid 500); 16 Feb 2012 21:44:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 75923 invoked by uid 500); 16 Feb 2012 21:43:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75915 invoked by uid 99); 16 Feb 2012 21:43:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2012 21:43:59 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS,URIBL_DBL_REDIR X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jej2003@gmail.com designates 209.85.210.48 as permitted sender) Received: from [209.85.210.48] (HELO mail-pz0-f48.google.com) (209.85.210.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2012 21:43:52 +0000 Received: by dadp13 with SMTP id p13so3011467dad.35 for ; Thu, 16 Feb 2012 13:43:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=SDuYZMY9oQiHey00Z3ynPD7DjZaWm23kJJNM95vi7Ak=; b=WaloINOIs/Lf41qw0lMHFjLKjdIA5FJkro9N7zPCgn33Xu1SvAMpEw17z8BIcj78Dg toJsYQNyLcWh+Xpf3Jr8MUkejruJx4pljFNu/WI2D3iKLYBOSkLlhmXiD74M6M7SwaG2 nzjJbrJhedaJSvPQ5Vj42efxucAZdhRdWMTzI= MIME-Version: 1.0 Received: by 10.68.73.225 with SMTP id o1mr17597052pbv.77.1329428611107; Thu, 16 Feb 2012 13:43:31 -0800 (PST) Received: by 10.68.24.105 with HTTP; Thu, 16 Feb 2012 13:43:31 -0800 (PST) In-Reply-To: <00bd01ccecf3$c51a6070$4f4f2150$@thetaphi.de> References: <0c065427.000006a4.0000001c@kermit> <00bc01ccecee$e47c6170$ad752450$@thetaphi.de> <021601ccecef$ddb5a670$9920f350$@com> <00bd01ccecf3$c51a6070$4f4f2150$@thetaphi.de> Date: Thu, 16 Feb 2012 16:43:31 -0500 Message-ID: Subject: Re: query for documents WITHOUT a field? From: Jamie Johnson To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Another possible solution is while indexing insert a custom token which is impossible to show up in the index otherwise, then do the filter based on that token. On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler wrote: > As the documentation states: > Lucene is an inverted index that does not have per-document fields. It only > knows terms pointing to documents. The query you are searching is a query > that returns all documents which have no term. To execute this query, it > will get the term index and iterate all terms of a field, mark those in a > bitset and negates that. The filter/query I told you uses the FieldCache to > do this. Since 3.6 (also in 3.5, but there it is buggy/API different) there > is another fieldcache that returns exactly that bitset. The filter mentioned > only uses that bitset from this new fieldcache. Fieldcache is populated on > first access and keeps alive as long as the underlying index segment is open > (means as long as IndexReader is open and the parts of the index is not > refreshed). If you are also sorting against your fields or doing other > queries using FieldCache, there is no overhead, otherwise the bitset is > populated on first access to the filter. > > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo term is > the only solution (and also much faster on the first access in Lucene 3.6). > Later accesses hitting the cache in 3.6 will be faster, of course. > > Another hacky way to achieve the same results is (works with almost any > Lucene version): > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a > full term index scan without caching :-). You may use CachingWrapperFilter > with PrefixFilter instead. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > >> -----Original Message----- >> From: Tim Eck [mailto:timeck@gmail.com] >> Sent: Thursday, February 16, 2012 10:14 PM >> To: java-user@lucene.apache.org >> Subject: RE: query for documents WITHOUT a field? >> >> Thanks for the fast response. I'll certainly have a look at the upcoming > 3.6.x >> release. What is the expected performance for using a negated filter? >> In particular does it defeat the index in any way and require a full index > scan? >> Is it different between regular fields and numeric fields? >> >> For 3.5 and earlier though, is there any suggestion other than magic > values? >> >> -----Original Message----- >> From: Uwe Schindler [mailto:uwe@thetaphi.de] >> Sent: Thursday, February 16, 2012 1:07 PM >> To: java-user@lucene.apache.org >> Subject: RE: query for documents WITHOUT a field? >> >> Lucene 3.6 will have a FieldValueFilter that can be negated: >> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true)); >> >> (see http://goo.gl/wyjxn) >> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from > Jenkins: >> http://goo.gl/Ka0gr >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: uwe@thetaphi.de >> >> >> > -----Original Message----- >> > From: Tim Eck [mailto:teck@terracottatech.com] >> > Sent: Thursday, February 16, 2012 9:59 PM >> > To: java-user@lucene.apache.org >> > Subject: query for documents WITHOUT a field? >> > >> > My apologies if this answer is readily available someplace, I've >> > searched around and not found a definitive answer. >> > >> > >> > >> > I'd like to run a query for documents that _do not_ contain particular >> indexed >> > fields to implement something like a SQL-like query where a column is >> null. >> > >> > >> > >> > I understand I could possibly use a magic value to represent "null", >> > but >> the data >> > I'm searching doesn't led itself to reserving a value for null. I also >> understand I >> > could add an extra field to hold this boolean isNull state but would >> > love >> a better >> > solution :-) >> > >> > >> > >> > TIA >> > >> > >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org