lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com.au>
Subject Re: Queries not derived from the text index
Date Wed, 08 Feb 2006 23:46:38 GMT
Erik Hatcher wrote:
> One interesting option is to subclass QueryParser and override 
> getFieldQuery.  When the field is "tag", return a FilteredQuery (see 
> trunk codebase, or the nightly 1.9 binaries) using a Filter that 
> interfaces with your database.  Caching of the filters would be 
> desirable for performance reasons.

Aha.  That does sound like it could work, although it will be an 
interesting exercise in trickery.

I'm not sure it would entirely work at the getFieldQuery level, perhaps 
at the getBooleanQuery level.  The reasoning is this...

   text:camel AND tag:zoo

     This needs to become a single FilteredQuery with a TermQuery
     (text:camel) and a TagFilter (tag:zoo).

   text:camel OR tag:zoo

     This needs to become a BooleanQuery.  The TermQuery (text:camel)
     would optional, and the other query would be a FilteredQuery which
     filters an MatchAllDocsQuery with a TagFilter (tag:zoo).

   text:camel NOT tag:zoo

     This would be a FilteredQuery with a TermQuery (text:camel) and
     a NotFilter over a TagFilter(tag:zoo).

It's complicated, but it seems like it would work.  The only cases which 
become really hard are cases where there are multiple non-text-index 
queries in there.  Then I might have to use an AndFilter or similar. 
And in cases where there are only non-text-index queries in there I 
would have to automatically insert a MatchAllDocsQuery.

> In the latest codebase, there is a MatchAllDocsQuery that can be used in 
> this case.  I also have implemented this sort of thing with a custom 
> query parser for a client.

This sounds interesting in itself.  I was trying to write one of these 
myself, not realising that it had been added into source control 
recently.  My plan was to get the query returning all docs and then 
figure out how to abstract it so that it could filter the returned docs 
down on the fly.

I may yet be able to use MatchAllDocsQuery as a means for doing this, as 
it will contain a lot of the framework code which I was finding it hard 
to write myself (having to write a Query, Weight and Scorer class is 
something I wanted to try and abstract away from our own custom ones.)

My main motivation for wanting to use a "real" query as opposed to a 
FilteredQuery is that filters cost more up-front, and if you cache them 
then they start costing in memory (our indexes are huge, therefore they 
cost a LOT of memory.)  Real queries are more or less a BitSet 
implemented as an iterator, which is far preferable for us.

Daniel


-- 
Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message