lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: query for documents WITHOUT a field?
Date Thu, 16 Feb 2012 21:43:31 GMT
Another possible solution is while indexing insert a custom token
which is impossible to show up in the index otherwise, then do the
filter based on that token.


On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> As the documentation states:
> Lucene is an inverted index that does not have per-document fields. It only
> knows terms pointing to documents. The query you are searching is a query
> that returns all documents which have no term. To execute this query, it
> will get the term index and iterate all terms of a field, mark those in a
> bitset and negates that. The filter/query I told you uses the FieldCache to
> do this. Since 3.6 (also in 3.5, but there it is buggy/API different) there
> is another fieldcache that returns exactly that bitset. The filter mentioned
> only uses that bitset from this new fieldcache. Fieldcache is populated on
> first access and keeps alive as long as the underlying index segment is open
> (means as long as IndexReader is open and the parts of the index is not
> refreshed). If you are also sorting against your fields or doing other
> queries using FieldCache, there is no overhead, otherwise the bitset is
> populated on first access to the filter.
>
> Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo term is
> the only solution (and also much faster on the first access in Lucene 3.6).
> Later accesses hitting the cache in 3.6 will be faster, of course.
>
> Another hacky way to achieve the same results is (works with almost any
> Lucene version):
> BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
> PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
> full term index scan without caching :-). You may use CachingWrapperFilter
> with PrefixFilter instead.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Tim Eck [mailto:timeck@gmail.com]
>> Sent: Thursday, February 16, 2012 10:14 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: query for documents WITHOUT a field?
>>
>> Thanks for the fast response. I'll certainly have a look at the upcoming
> 3.6.x
>> release. What is the expected performance for using a negated filter?
>> In particular does it defeat the index in any way and require a full index
> scan?
>> Is it different between regular fields and numeric fields?
>>
>> For 3.5 and earlier though, is there any suggestion other than magic
> values?
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Thursday, February 16, 2012 1:07 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: query for documents WITHOUT a field?
>>
>> Lucene 3.6 will have a FieldValueFilter that can be negated:
>>
>> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
>>
>> (see http://goo.gl/wyjxn)
>>
>> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
> Jenkins:
>> http://goo.gl/Ka0gr
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Tim Eck [mailto:teck@terracottatech.com]
>> > Sent: Thursday, February 16, 2012 9:59 PM
>> > To: java-user@lucene.apache.org
>> > Subject: query for documents WITHOUT a field?
>> >
>> > My apologies if this answer is readily available someplace, I've
>> > searched around and not found a definitive answer.
>> >
>> >
>> >
>> > I'd like to run a query for documents that _do not_ contain particular
>> indexed
>> > fields to implement something like a SQL-like query where a column is
>> null.
>> >
>> >
>> >
>> > I understand I could possibly use a magic value to represent "null",
>> > but
>> the data
>> > I'm searching doesn't led itself to reserving a value for null. I also
>> understand I
>> > could add an extra field to hold this boolean isNull state but would
>> > love
>> a better
>> > solution :-)
>> >
>> >
>> >
>> > TIA
>> >
>> >
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message