lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <>
Subject [jira] Commented: (LUCENE-2724) BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets
Date Tue, 26 Oct 2010 16:26:22 GMT


Paul Elschot commented on LUCENE-2724:

Indeed that should speed up things.

The first case in the replacing code is actually only a copy of the underlying OpenBitSet,
so perhaps it could be simplified to do just that.

And some common code for this between ChainedFilter and BooleanFilter could perhaps be moved
to OpenBitSetDISI.

> BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets
> ----------------------------------------------------------------------
>                 Key: LUCENE-2724
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 3.0.2
>            Reporter: Fatih Uzdilli
> In line 65 of the BooleanFilter class there is an optimization for OpenBitSets, but i
miss an optimization in line 62. 
> I would replace the existing line:
> {code}
> res = new OpenBitSetDISI(getDISI(shouldFilters, i, reader), reader.maxDoc());
> {code} 
> with following code:
> {code}
> DocIdSet docIdSet = shouldFilters.get(i).getDocIdSet(reader);
> if(docIdSet instanceof OpenBitSet) {
> 	res = new OpenBitSetDISI(reader.maxDoc());
> 	res.or((OpenBitSet) docIdSet);
> } else {
> 	res = new OpenBitSetDISI(docIdSet.iterator(), reader.maxDoc());
> }
> {code}
> Same for line 78 and 95, adjusted for not and must filters.
> That leads to an up to 5 times slower AND-combination in my test, where i had two filters
to be AND-combined returning each a cached OpenBitSet, one with a cardinality of 15000 and
the other with a cardinality of 13000. The result had a cardinality of 8300. Thats important
if you do that 1000 times with a lot more documents.
> The same must be also done for ChainedFilter in the method initialResult(..).
> Also, the getDISI method in the BooleanFilter must be replaced by a getDocIdSet(..) method.
This is useful because in line 87 the docIdSet is retrieved and in line 92 again when it is
not of type OpenBitSet. This may also lead to a performance issue if the getDocIdSet method
of a sub filter is not super fast.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message