lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <>
Subject [jira] Commented: (LUCENE-1296) Allow use of compact DocIdSet in CachingWrapperFilter
Date Sun, 01 Jun 2008 22:51:44 GMT


Paul Elschot commented on LUCENE-1296:

I tried to come up with a sensible performance test to determine a good criterium to choose
between OpenBitSet and SortedVIntList as the DocIdSet supporting data structure to be cached.
There is a criterium for this in the patch in docIdSetToCache() method of CachingWrapperFilter,
but it's only based on byte size, and it favours SortedVIntList when it is defenitely more
compact than OpenBitSet.

The current criterium is to use (cardinality (=nr bits set in OpenBitSet) < maxDocs/9)
as a test to prefer SortedVIntList over OpenBitSet for caching. The constant 9 might be replaced
by a configuration parameter to allow easy performance experiments there. It could be that
a larger value than 9 is  turns out to be "optimal" in runtime.

In some cases OpenBitSet can be faster on skipTo(int docNum) than SortedVIntList, even when
SortedVIntList is more compact. As Filters can be expected to use skipTo() heavily, this could
be important for performance.

Even even though it might be possible to measure the skipTo() performance directly, the effect
of the more compact cached data structure of SortedVIntList on garbage collection is (pretty
close to) impossible to measure in a simple test case.

Eks Dev had some interesting results there in the very early stages of LUCENE-584 (September
2006), so I wonder whether these results could be confirmed somehow using the patch here and
the current trunk.


> Allow use of compact DocIdSet in CachingWrapperFilter
> -----------------------------------------------------
>                 Key: LUCENE-1296
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Paul Elschot
>            Assignee: Michael Busch
>            Priority: Minor
>         Attachments: cachedFilter20080529.patch
> Extends CachingWrapperFilter with a protected method to determine the DocIdSet to be

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message