lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Does anyone have tips on managing cached filters?
Date Tue, 27 Nov 2012 11:17:30 GMT
On Tue, Nov 27, 2012 at 9:31 AM, Robert Muir <rcmuir@gmail.com> wrote:
> On Thu, Nov 22, 2012 at 11:10 PM, Trejkaz <trejkaz@trypticon.org> wrote:
>
>>
>> As for actually doing the invalidation, CachingWrapperFilter itself
>> doesn't appear to have any mechanism for invalidation at all, so I
>> imagine I will be building a variation of it with additional methods
>> to invalidate parts of the cache.
>>
>>
> Actually it does, it uses a weakhashmap keyed on either the segment
> (core+deletes) or just the segment's core.

Ah, yeah... I should have been clearer on what I meant there.

If you want to make a filter which relies on data that isn't in the
index, there is no mechanism for invalidation. One example of it is if
you have a filter which essentially constructs a query based on the
contents of a text file (like a word list.) Another example is with
tagging, with the tags stored in an external database.

At the moment we use a separate level of filter cache which asks the
contained filter whether it's still OK to use (if the timestamp on the
file changes, it gets ejected from the cache.) I suspect the same
cache is useful anyway, as it also holds onto the filter instances so
that they don't get collected too soon (filters can come out of our
query parser, so the caller can't conveniently hold onto the instances
in all cases. Sometimes they do two similar queries which happen to
call the same filter, so caching the entire resulting query doesn't
help either.)

An interesting, somewhat-related issue is that for some filters, we
can't keep the contents of the file itself in memory due to size
limits, so we have to read it on the fly. When there are multiple
segments, the file gets read multiple times. So it's a rare case where
computing the filter across all readers might actually come out faster
than computing it per-segment...

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message