lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <>
Subject Re: Queries not derived from the text index
Date Tue, 07 Feb 2006 23:17:47 GMT
Erik Hatcher wrote:
> On Feb 7, 2006, at 1:09 AM, Daniel Noll wrote:
>> I've got an unusual (if not crazy) question about implementing custom 
>> queries.
>> Basically we have a UI where a user can enter a query and then select 
>> a bunch of filters to be applied to the query.  These filters are 
>> currently implemented using a fairly simple wrapper around Lucene's 
>> own Filter class.
>> Now we have one particular customer who says he wants to AND and OR 
>> these filters in various combinations.  Now, I know that this is 
>> possible on the model side, because I've already created AndFilter and 
>> OrFilter classes to do this sort of thing.  However, the view of the 
>> user interface would be impossible to keep simple if I were to bulk it 
>> up to support this kind of filtering.
> One thing that may be of use is the ChainedFilter:

Looks very similar to our own AndFilter/OrFilter/etc, just wrapping 
everything into one class instead of separating it out.  Actually, with 
our stuff we have hidden everything behind a FilterUtils.and() method 
which also handles special cases like not making it AND if there are 0 
or 1 filters in the list.

Either way we have that much working already. :-)

>> Is it possible to customise the QueryParser so that it returns Query 
>> instances that have no relationship to the text index whatsoever?  For 
>> instance, many of our existing filters build their bitsets exclusively 
>> using a database, and we would need to keep this as-is because we 
>> don't want to modify the text index itself.
> I don't really follow what you're after here.  Could you give a concrete 
> example of what you're after with "query" parsing?

Here's an example which is very much like what we need to do.

Take a text index with a few gigs of data in it.  For simplicity, let's 
say it has a single field called "text".  We allow the user to annotate 
the documents in this text index.

We keep the user's annotations outside of the text index because:

   1. Changing the contents of a document in the text index is
      non-trivial, requiring deleting and re-adding the document,
      and if the document contains unstored fields this is extremely

   2. It's helpful to write-protect a text index in order to be more
      certain that nobody has tampered with the text index after it was
      created; and

   3. It allows tricks like multiple users updating the annotations at
      the same time, and seeing each others' changes.

So a user might want to enter something like this:

     text:camel AND tag:zoo

In this case we would want a real FieldQuery object for the text:camel 
portion, and a non-Lucene Query instance for the "tag:zoo" portion which 
actually queries the Tags table in the database instead of the text index.

This is a simple example, the user might want to say:

     text:camel AND (tag:zoo OR tag:desert)

This kind of logic could be done, as mentioned, using an OrFilter. 
However, there is one more case which can't be done using tricks with 

     (text:camel AND tag:zoo) OR (text:fish AND tag:aquarium)

If we were to make these work as an actual Query, then the user is 
completely free to enter what they want.

> It's certainly feasible to build a custom parser (JavaCC or otherwise) 
> that does whatever you want, but that can be quite a complex endeavor.

Actually, we don't need to change the syntax.  Our plan would be to 
override the getFieldQuery method on QueryParser in order to drop our 
own Query class in on top of where a "real" field query would have been.

Incidentally, we already override QueryParser in order to perform a 
couple of boolean queries which Lucene for some reason doesn't like.

For example:

     NOT text:camel

We change that on the fly to be like "dummy:1 NOT text:camel" where 
"dummy" is just a field which exists on every document and always 
contains "1"... a cheap trick, but it works.  Actually, if it turns out 
we can make Query instances that don't depend on the text index, then we 
can fake this a little better.


Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message