lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
Date Sun, 26 Jun 2011 20:41:47 GMT


Uwe Schindler commented on LUCENE-1536:

Hi Mike,
nicae patch, only little bit big. I reviewed the essential parts like applying the filter
in IndexSearcher, real cool. Also CachingWrapperFilter looks fine (not closely reviewed).

My question: Do we really need to make the delDocs inverse in *this* issue? The IndexSearcher
impl can also be done using a simple OrNotBits(delDocs, filterDocs) wrapper (instead AndBits)
implementation and NotBits (if no delDocs available)? The patch is unreadable because of that.
In general, reversing the delDocs might be a good idea, but we should do it separate and hard
(not allow both variants implemented by IndexReader & Co.). The method name getNotDeletedDocs()
should also be getVisibleDocs() or similar [I don't like double negation].

About the filters: I like the new API (it is as discussed before), so the DocIdSet is extended
by an optional getBits() method, defaulting to null.

About the impls: FieldCacheRangeFilter can also implement getBits() directly as FieldCache
is random access. It should just return an own Bits impl for the DocIdSet that checks the
filtering in get(index).

> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>                 Key: LUCENE-1536
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>         Attachments:, LUCENE-1536.patch, LUCENE-1536.patch,
LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message