lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: query for documents WITHOUT a field?
Date Fri, 26 Oct 2012 00:29:27 GMT
"OR allergies IS NULL" would be "OR (*:* -allergies:[* TO *])" in 
Lucene/Solr.

-- Jack Krupansky

-----Original Message----- 
From: Vitaly Funstein
Sent: Thursday, October 25, 2012 8:25 PM
To: java-user@lucene.apache.org
Subject: Re: query for documents WITHOUT a field?

Sorry for resurrecting an old thread, but how would one go about writing a
Lucene query similar to this?

SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL

An AND case would be easy since one would just use a simple TermQuery with
a FieldValueFilter added, but what about other boolean cases? Admittedly,
this is a contrived example, but the point here is that it seems that since
filters are always applied to results after they are returned, how would
one go about making the null-ness of a field part of the query logic?

On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> I already mentioned that pseudo NULL term, but the user asked for another
> solution...
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> Jamie Johnson <jej2003@gmail.com> schrieb:
>
> Another possible solution is while indexing insert a custom token
> which is impossible to show up in the index otherwise, then do the
> filter based on that token.
>
>
> On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> > As the documentation states:
> > Lucene is an inverted index that does not have per-document fields. It
> only
> > knows terms pointing to documents. The query you are searching is a 
> > query
> > that returns all documents which have no term. To execute this query, it
> > will get the term index and iterate all terms of a field, mark those in 
> > a
> > bitset and negates that. The filter/query I told you uses the FieldCache
> to
> > do this. Since 3.6 (also in 3.5, but there it is buggy/API different)
> there
> > is another fieldcache that returns exactly that bitset. The filter
> mentioned
> > only uses that bitset from this new fieldcache. Fieldcache is populated
> on
> > first access and keeps alive as long as the underlying index segment is
> open
> > (means as long as IndexReader is open and the parts of the index is not
> > refreshed). If you are also sorting against your fields or doing other
> > queries using FieldCache, there is no overhead, otherwise the bitset is
> > populated on first access to the filter.
> >
> > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo
> term is
> > the only solution (and also much faster on the first access in Lucene
> 3.6).
> > Later accesses hitting the cache in 3.6 will be faster, of course.
> >
> > Another hacky way to achieve the same results is (works with almost any
> > Lucene version):
> > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and
> > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a
> > full term index scan without caching :-). You may use
> CachingWrapperFilter
> > with PrefixFilter instead.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Tim Eck [mailto:timeck@gmail.com]
> >> Sent: Thursday, February 16, 2012 10:14 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: query for documents WITHOUT a field?
> >>
> >> Thanks for the fast response. I'll certainly have a look at the 
> >> upcoming
> > 3.6.x
> >> release. What is the expected performance for using a negated filter?
> >> In particular does it defeat the index in any way and require a full
> index
> > scan?
> >> Is it different between regular fields and numeric fields?
> >>
> >> For 3.5 and earlier though, is there any suggestion other than magic
> > values?
> >>
> >> -----Original Message-----
> >> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> >> Sent: Thursday, February 16, 2012 1:07 PM
> >> To: java-user@lucene.apache.org
> >> Subject: RE: query for documents WITHOUT a field?
> >>
> >> Lucene 3.6 will have a FieldValueFilter that can be negated:
> >>
> >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true));
> >>
> >> (see http://goo.gl/wyjxn)
> >>
> >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from
> > Jenkins:
> >> http://goo.gl/Ka0gr
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>
> >> > -----Original Message-----
> >> > From: Tim Eck [mailto:teck@terracottatech.com]
> >> > Sent: Thursday, February 16, 2012 9:59 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: query for documents WITHOUT a field?
> >> >
> >> > My apologies if this answer is readily available someplace, I've
> >> > searched around and not found a definitive answer.
> >> >
> >> >
> >> >
> >> > I'd like to run a query for documents that _do not_ contain 
> >> > particular
> >> indexed
> >> > fields to implement something like a SQL-like query where a column is
> >> null.
> >> >
> >> >
> >> >
> >> > I understand I could possibly use a magic value to represent "null",
> >> > but
> >> the data
> >> > I'm searching doesn't led itself to reserving a value for null. I 
> >> > also
> >> understand I
> >> > could add an extra field to hold this boolean isNull state but would
> >> > love
> >> a better
> >> > solution :-)
> >> >
> >> >
> >> >
> >> > TIA
> >> >
> >> >
> >>
> >>
> >>
> >>_____________________________________________
>
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >>
> >>_____________________________________________
>
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >_____________________________________________
>
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> _____________________________________________
>
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message