lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Question on CachingWrapperFilter
Date Tue, 02 Jun 2009 18:35:03 GMT
On Tuesday 02 June 2009 16:39:06 Shai Erera wrote:
> Hi
> I read CWF today and initially I thought this is going to cache a Filter
> in-memory for me, so that I can more efficiently use it for subsequent
> searches. But I learned that all it does is cache the DocIdSet returned by
> the wrapped Filter.
> This is good in and on itself, but I wonder if we shouldn't go the extra
> mile and wrap stuff in memory for Filters which don't operate from memory.

It was good until QueryWrapperFilter returned a Scorer instead of a disi
based on an (Open)BitSet.

> For example - I have a Filter which reads information from a Payload as it's
> iterated on, so it doesn't keep anything in memory (it's per-user
> information, so I haven't decided yet if I can afford caching it in-memory
> and whether it will be beneficial). Caching that sort of Filter by CWF will
> obviously not improve anything.
> I'm not sure what to do here:
> 1. Just reflect that in the javadoc (it is very confusing saying "Wraps
> another filter's result and caches it", which is not true)
> 2. Introduce a class which takes a Filter and loads it into memory (I think
> I read an issue/discussion about this), to an OpenBitSet for example (but we
> need to know the number of results in advance, or grow the array as we go
> along).
> 3. Don't use CWF, write a "load-a-Filter-into-in-memory-Filter" utility, and
> cache the Filters w/ the user as Key.

For that, one could subclass CWF and override the docIdSetToCache method
to return an OpenBitSetDISI constructed from the given disi.

> I will probably need to do the second part of (3) anyway, so I'm asking
> whether such a utility is useful to exist in Lucene, and perhaps there's
> already one (I thought I read somewhere about the ability to execute a Query
> and get back a Filter, or use the results as a Filter)?

That is what QueryWrapperFilter does.

> I looked at
> QueryWrapperFilter, but it doesn't seem to give me what I need, since its
> getDocIdSet method returns an iterator which is the Scorer of the Query that
> it wraps.

The Scorer seems to be what you need, but there are cheaper disis, see below.

> Anyway, I think the documentation of CWF should be fixed and made clearer.
> Any thoughts?

The basic problem is that disis from DocIdSets come in two variations: expensive
ones e.g. based on a query, and cheap ones based e.g. on an OpenBitSet or on
a SortedVIntList.
One would normally want to cache a DocIdSet that provides a cheap disi.

For the javadocs of the current CWF it could be sufficient to mention more
prominently that the default CWF caches the given DocIdSet, basically
assuming that it's disi is cheap.

But it might be a good idea to change the default implementation to check
whether the given DocIdSet is an OpenBitSet, and use that to be cached in
that case, and otherwise provide an OpenBitSetDISI.

Paul Elschot

View raw message