lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Thu, 21 May 2009 09:24:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711533#action_12711533
] 

Shai Erera commented on LUCENE-1614:
------------------------------------

Hmmm ... there is a problem with MAX_VAL as sentinel:

# OpenBitSetIterator already has a nextDoc() method which returns -1 when exhausted. The current
patch used -1 as sentinel, and therefore it was ok. But when I changed it to MAX_VAL, test-tag
fail on TestOpenBitSet. We can document this change in "changes to back-compat policy" and
fix the test on trunk and tag.
# SortedVIntList returns an iterator which is a DISI. MAX_VAL is considered a valid value
for SortedVIntList (TestSortedVIntList.test03() validates that). So if we use it as sentinel,
we declare that MAX_VAL is invalid for SortedVIntList. Not sure if we can do that.

I think the second is the problematic one - how do we handle iterators for which MAX_VAL is
a valid value? I tend to say MAX_VAL should not be a valid value since by the name, *DocId*SetIterator,
we should return doc Ids, and MAX_VAL is used as sentinel elsewhere and is not even considered
a valid doc Id. Therefore those iterators should change their logic, if they rely on MAX_VAL
being a valid value.

BTW, besides the convenience, why should SortedVIntList expose a DocIdSetIterator? Nothing
implies the list will hold doc Ids, therefore why commit to an iterator which returns doc
Ids? If it's for convenience only, then maybe wrap that iterator with a true DISI where the
Lucene code will need a true DISI?

What do you think?

> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message