lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Powers" <jpow...@configureone.com>
Subject RE: Queries not derived from the text index
Date Wed, 08 Feb 2006 17:51:49 GMT
This may be a tangent, but for my filters and searches, I construct the
query with "+" and "-" and what not..   is this not the right way to do
this?    I haven't had to extend or write any special AND or OR classes,
I just write the query and search the once.     Any advantage to writing
Filter subclasses?

-----Original Message-----
From: Daniel Noll [mailto:daniel@nuix.com.au] 
Sent: Tuesday, February 07, 2006 5:18 PM
To: java-user@lucene.apache.org
Subject: Re: Queries not derived from the text index

Erik Hatcher wrote:
> 
> On Feb 7, 2006, at 1:09 AM, Daniel Noll wrote:
>> I've got an unusual (if not crazy) question about implementing custom

>> queries.
>>
>> Basically we have a UI where a user can enter a query and then select

>> a bunch of filters to be applied to the query.  These filters are 
>> currently implemented using a fairly simple wrapper around Lucene's 
>> own Filter class.
>>
>> Now we have one particular customer who says he wants to AND and OR 
>> these filters in various combinations.  Now, I know that this is 
>> possible on the model side, because I've already created AndFilter
and 
>> OrFilter classes to do this sort of thing.  However, the view of the 
>> user interface would be impossible to keep simple if I were to bulk
it 
>> up to support this kind of filtering.
> 
> One thing that may be of use is the ChainedFilter:
> 
>
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous/
src/java/org/apache/lucene/misc/ChainedFilter.java 

Looks very similar to our own AndFilter/OrFilter/etc, just wrapping 
everything into one class instead of separating it out.  Actually, with 
our stuff we have hidden everything behind a FilterUtils.and() method 
which also handles special cases like not making it AND if there are 0 
or 1 filters in the list.

Either way we have that much working already. :-)

>> Is it possible to customise the QueryParser so that it returns Query 
>> instances that have no relationship to the text index whatsoever?
For 
>> instance, many of our existing filters build their bitsets
exclusively 
>> using a database, and we would need to keep this as-is because we 
>> don't want to modify the text index itself.
> 
> I don't really follow what you're after here.  Could you give a
concrete 
> example of what you're after with "query" parsing?

Here's an example which is very much like what we need to do.

Take a text index with a few gigs of data in it.  For simplicity, let's 
say it has a single field called "text".  We allow the user to annotate 
the documents in this text index.

We keep the user's annotations outside of the text index because:

   1. Changing the contents of a document in the text index is
      non-trivial, requiring deleting and re-adding the document,
      and if the document contains unstored fields this is extremely
      difficult;

   2. It's helpful to write-protect a text index in order to be more
      certain that nobody has tampered with the text index after it was
      created; and

   3. It allows tricks like multiple users updating the annotations at
      the same time, and seeing each others' changes.

So a user might want to enter something like this:

     text:camel AND tag:zoo

In this case we would want a real FieldQuery object for the text:camel 
portion, and a non-Lucene Query instance for the "tag:zoo" portion which

actually queries the Tags table in the database instead of the text
index.

This is a simple example, the user might want to say:

     text:camel AND (tag:zoo OR tag:desert)

This kind of logic could be done, as mentioned, using an OrFilter. 
However, there is one more case which can't be done using tricks with 
filters:

     (text:camel AND tag:zoo) OR (text:fish AND tag:aquarium)

If we were to make these work as an actual Query, then the user is 
completely free to enter what they want.

> It's certainly feasible to build a custom parser (JavaCC or otherwise)

> that does whatever you want, but that can be quite a complex endeavor.

Actually, we don't need to change the syntax.  Our plan would be to 
override the getFieldQuery method on QueryParser in order to drop our 
own Query class in on top of where a "real" field query would have been.

Incidentally, we already override QueryParser in order to perform a 
couple of boolean queries which Lucene for some reason doesn't like.

For example:

     NOT text:camel

We change that on the fly to be like "dummy:1 NOT text:camel" where 
"dummy" is just a field which exists on every document and always 
contains "1"... a cheap trick, but it works.  Actually, if it turns out 
we can make Query instances that don't depend on the text index, then we

can fake this a little better.

Daniel



-- 

Daniel Noll

Nuix Australia Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia
Phone: (02) 9280 0699
Fax:   (02) 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message