lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5441) Decouple DocIdSet from OpenBitSet and FixedBitSet
Date Wed, 22 Oct 2014 15:18:34 GMT


Robert Muir commented on LUCENE-5441:

Just one general question: Is it really needed that the iterator also have cost()? In my opinion,
it should be fine when you call cost() on the DocIdSet. If you already have an iterator, why
call cost - it returns the same as the DocIdSet (in general)? This would make the extra ctor
parameter for the FixedBitSetIterator obsolete.

Currently, cost() is defined on DocumentIDSetIterator and of course subclasses: docsenum &
co: implemented as docFreq by postings lists, e.g. termscorer as its docsEnum.cost().

This is used by conjunctionscorer/minshouldmatch/filteredquery etc to do conjunctions and
so on.

> Decouple DocIdSet from OpenBitSet and FixedBitSet
> -------------------------------------------------
>                 Key: LUCENE-5441
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/other
>    Affects Versions: 4.6.1
>            Reporter: Uwe Schindler
>             Fix For: Trunk
>         Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch,
> Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow kept the stupid
"filters can return a BitSet directly" in the code. So lots of Filters return just FixedBitSet,
because this is the superclass (ideally interface) of FixedBitSet.
> We should decouple that and *not* implement that abstract interface directly by FixedBitSet.
This leads to bugs e.g. in BlockJoin, because it used Filters in a wrong way, just because
it was always returning Bitsets. But some filters actually don't do this.
> I propose to let FixedBitSet (only in trunk, because that a major backwards break) just
have a method {{asDocIdSet()}}, that returns an anonymous instance of DocIdSet: bits() returns
the FixedBitSet itsself, iterator() returns a new Iterator (like it always did) and the cost/cacheable
methods return static values.
> Filters in trunk would need to be changed like that:
> {code:java}
> FixedBitSet bits = ....
> ...
> return bits;
> {code}
> gets:
> {code:java}
> FixedBitSet bits = ....
> ...
> return bits.asDocIdSet();
> {code}
> As this methods returns an anonymous DocIdSet, calling code can no longer rely or check
if the implementation behind is a FixedBitSet.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message