lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <vfunst...@gmail.com>
Subject Re: query for documents WITHOUT a field?
Date Fri, 26 Oct 2012 00:55:27 GMT
This is the QueryParser syntax, right? So an API equivalent for the not
null case would be something like this?

BooleanQuery q = new BooleanQuery();
q.add(new BooleanClause(new TermQuery(new Term("first_name", "Zed")),
Occur.SHOULD));
q.add(new BooleanClause(new TermRangeQuery("allergies", null, null, true,
true), Occur.SHOULD));

Whereas, for "IS NULL" the TermRangeQuery above would need to be wrapped in
another BooleanClause with Occur.MUST_NOT?

On Thu, Oct 25, 2012 at 5:29 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> "OR allergies IS NULL" would be "OR (*:* -allergies:[* TO *])" in
> Lucene/Solr.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Vitaly Funstein
> Sent: Thursday, October 25, 2012 8:25 PM
> To: java-user@lucene.apache.org
> Subject: Re: query for documents WITHOUT a field?
>
>
> Sorry for resurrecting an old thread, but how would one go about writing a
> Lucene query similar to this?
>
> SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL
>
> An AND case would be easy since one would just use a simple TermQuery with
> a FieldValueFilter added, but what about other boolean cases? Admittedly,
> this is a contrived example, but the point here is that it seems that since
> filters are always applied to results after they are returned, how would
> one go about making the null-ness of a field part of the query logic?
>
> On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>  I already mentioned that pseudo NULL term, but the user asked for another
>> solution...
>> --
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, 28213 Bremen
>> http://www.thetaphi.de
>>
>>
>>
>> Jamie Johnson <jej2003@gmail.com> schrieb:
>>
>> Another possible solution is while indexing insert a custom token
>> which is impossible to show up in the index otherwise, then do the
>> filter based on that token.
>>
>>
>> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> > As the documentation states:
>> > Lucene is an inverted index that does not have per-document fields. It
>> only
>> > knows terms pointing to documents. The query you are searching is a >
>> query
>> > that returns all documents which have no term. To execute this query, it
>> > will get the term index and iterate all terms of a field, mark those in
>> > a
>> > bitset and negates that. The filter/query I told you uses the FieldCache
>> to
>> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different)
>> there
>> > is another fieldcache that returns exactly that bitset. The filter
>> mentioned
>> > only uses that bitset from this new fieldcache. Fieldcache is populated
>> on
>> > first access and keeps alive as long as the underlying index segment is
>> open
>> > (means as long as IndexReader is open and the parts of the index is not
>> > refreshed). If you are also sorting against your fields or doing other
>> > queries using FieldCache, there is no overhead, otherwise the bitset is
>> > populated on first access to the filter.
>> >
>> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo
>> term is
>> > the only solution (and also much faster on the first access in Lucene
>> 3.6).
>> > Later accesses hitting the cache in 3.6 will be faster, of course.
>> >
>> > Another hacky way to achieve the same results is (works with almost any
>> > Lucene version):
>> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
>> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
>> > full term index scan without caching :-). You may use
>> CachingWrapperFilter
>> > with PrefixFilter instead.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> >> -----Original Message-----
>> >> From: Tim Eck [mailto:timeck@gmail.com]
>> >> Sent: Thursday, February 16, 2012 10:14 PM
>> >> To: java-user@lucene.apache.org
>> >> Subject: RE: query for documents WITHOUT a field?
>> >>
>> >> Thanks for the fast response. I'll certainly have a look at the >>
>> upcoming
>> > 3.6.x
>> >> release. What is the expected performance for using a negated filter?
>> >> In particular does it defeat the index in any way and require a full
>> index
>> > scan?
>> >> Is it different between regular fields and numeric fields?
>> >>
>> >> For 3.5 and earlier though, is there any suggestion other than magic
>> > values?
>> >>
>> >> -----Original Message-----
>> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> >> Sent: Thursday, February 16, 2012 1:07 PM
>> >> To: java-user@lucene.apache.org
>> >> Subject: RE: query for documents WITHOUT a field?
>> >>
>> >> Lucene 3.6 will have a FieldValueFilter that can be negated:
>> >>
>> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
>> >>
>> >> (see http://goo.gl/wyjxn)
>> >>
>> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
>> > Jenkins:
>> >> http://goo.gl/Ka0gr
>> >>
>> >> -----
>> >> Uwe Schindler
>> >> H.-H.-Meier-Allee 63, D-28213 Bremen
>> >> http://www.thetaphi.de
>> >> eMail: uwe@thetaphi.de
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Tim Eck [mailto:teck@terracottatech.**com<teck@terracottatech.com>
>> ]
>> >> > Sent: Thursday, February 16, 2012 9:59 PM
>> >> > To: java-user@lucene.apache.org
>> >> > Subject: query for documents WITHOUT a field?
>> >> >
>> >> > My apologies if this answer is readily available someplace, I've
>> >> > searched around and not found a definitive answer.
>> >> >
>> >> >
>> >> >
>> >> > I'd like to run a query for documents that _do not_ contain >>
>
>> particular
>> >> indexed
>> >> > fields to implement something like a SQL-like query where a column
is
>> >> null.
>> >> >
>> >> >
>> >> >
>> >> > I understand I could possibly use a magic value to represent "null",
>> >> > but
>> >> the data
>> >> > I'm searching doesn't led itself to reserving a value for null. I >>
>> > also
>> >> understand I
>> >> > could add an extra field to hold this boolean isNull state but would
>> >> > love
>> >> a better
>> >> > solution :-)
>> >> >
>> >> >
>> >> >
>> >> > TIA
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >>____________________________**_________________
>>
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> >> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>> >>
>> >>
>> >>
>> >>____________________________**_________________
>>
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> >> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>> >
>> >
>> >_____________________________**________________
>>
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> > For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>> >
>>
>> ______________________________**_______________
>>
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
>
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message