lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fatih Uzdilli (JIRA)" <>
Subject [jira] Created: (LUCENE-2724) BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets
Date Tue, 26 Oct 2010 12:28:24 GMT
BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets

                 Key: LUCENE-2724
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
    Affects Versions: 3.0.2
            Reporter: Fatih Uzdilli

In line 65 of the BooleanFilter class there is an optimization for OpenBitSets, but i miss
an optimization in line 62. 

I would replace the existing line:
res = new OpenBitSetDISI(getDISI(shouldFilters, i, reader), reader.maxDoc());

with following code:
DocIdSet docIdSet = shouldFilters.get(i).getDocIdSet(reader);
if(docIdSet instanceof OpenBitSet) {
	res = new OpenBitSetDISI(reader.maxDoc());
	res.or((OpenBitSet) docIdSet);
} else {
	res = new OpenBitSetDISI(docIdSet.iterator(), reader.maxDoc());

Same for line 78 and 95, adjusted for not and must filters.

That leads to an up to 5 times slower AND-combination in my test, where i had two filters
to be AND-combined returning each a cached OpenBitSet, one with a cardinality of 15000 and
the other with a cardinality of 13000. The result had a cardinality of 8300. Thats important
if you do that 1000 times with a lot more documents.

The same must be also done for ChainedFilter in the method initialResult(..).

Also, the getDISI method in the BooleanFilter must be replaced by a getDocIdSet(..) method.
This is useful because in line 87 the docIdSet is retrieved and in line 92 again when it is
not of type OpenBitSet. This may also lead to a performance issue if the getDocIdSet method
of a sub filter is not super fast.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message