lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: MultiSegmentQueryFilter enhancement for interactive indexes?
Date Tue, 11 Jul 2006 03:42:59 GMT
Creation of the filters is very expense - usually involves a large  
range query. We also convert all range and prefix queries to filters  
since scoring these does not make sense to us...

For example, show sales where the sales price was > 0 and less than  
500k. Frequently the user will get too many results, and so then he  
will add another term (like the neighborhood).

Having the "sale price" filter cached helps performance immensely.

This is a bit of a contrived example (since sales are not updated  
very frequently).

A better example comes from Nutch. In the query optimizer, terms with  
a 0 boost that occur in N percent of the documents are converted into  
a filter. Having to recreate his filter everytime a docment is added  
is very expensive. With this change there is no performance hit to  
using the filter optimzation with highly interactive indices.

On Jul 10, 2006, at 10:31 PM, Bruce Ritchie wrote:

> Robert,
>
> Can you quantify 'through the roof' a bit? Are the filters that you  
> are
> creating that expensive to create or is it the usage of BitSets  
> that are
> the real cause of the performance improvement you've seen?
>
>
> Regards,
>
> Bruce Ritchie
>
> -----Original Message-----
> From: robert engels [mailto:rengels@ix.netcom.com]
> Sent: Friday, July 07, 2006 9:35 PM
> To: java-dev@lucene.apache.org
> Subject: Re: MultiSegmentQueryFilter enhancement for interactive
> indexes?
>
> I implemented it and it works great. I didn't worry about the  
> deletions
> since by the time a filter is used the deleted documents are already
> removed by the query. The only problem that arose out of this was for
> things like the ConstantScoreQuery (which uses a filter)
> - I needed to modify this query to ignore deleted documents.
>
> Now I have incremental cached filters - the query performance is going
> through the roof.
>
>
>
> On Jul 7, 2006, at 2:47 PM, Chris Hostetter wrote:
>
>>
>> I'm no segments/MultiReader expert, but your idea sounds good to me
>> ... it seems like it would certainly work in the "new segments"
>> situation.
>>
>> One thing i don't see you mention is dealing with deletions ... i'm
>> not sure if deleting documents cause the version number of an
>> IndexReader to change or not (if it does your job is easy) but  
>> even if
>
>> it doesn't I'm guessing you could say that if hasDeletions() returns
>> true, you have to assume you need to invalidate your cached bits
>> (worst case scenerio you are invalidating the cache as often as it is
>> now)
>>
>>
>> : Date: Fri, 7 Jul 2006 00:32:54 -0500
>> : From: robert engels <rengels@ix.netcom.com>
>> : Reply-To: java-dev@lucene.apache.org
>> : To: Lucene-Dev <java-dev@lucene.apache.org>
>> : Subject: MultiSegmentQueryFilter enhancement for interactive
>> indexes?
>> :
>> : I thought of a possible enhancement - before I go down the road, I
>> am
>> : looking for some input form the community?
>> :
>> : Currently, the QueryFilter caches the bits base upon the
>> IndexReader.
>> :
>> : The problem with this is small incremental changes to the index
>> : invalidate the cache.
>> :
>> : What if instead the filter determined that the underlying
>> IndexReader
>> : was a MultiReader and then maintained a bitset for each reader,
>> : combining them in bits() when requested. The filter could check if
>> : any of the underlying readers were the different (removed or added)
>> : and then just create a new bitset for that reader. With the new  
>> non-
>> : bit set filter implementations this could be even more memory
>> : efficient since the bitsets would not need to be combined into a
>> : single bitset.
>> :
>> : With the previous work on "reopen" so that segments are reused,  
>> this
>> : would allow filters to be far more useful in a highly interactive
>> : environment.
>> :
>> : What do you think?
>> :
>> :
>> :
>> :
>> :
>> :
>> :
>> ---------------------------------------------------------------------
>> : To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> : For additional commands, e-mail: java-dev-help@lucene.apache.org
>> :
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message