lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3225) Optimize TermsEnum.seek when caller doesn't need next term
Date Thu, 23 Jun 2011 07:05:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053675#comment-13053675
] 

Simon Willnauer commented on LUCENE-3225:
-----------------------------------------

Mike this seems like a good improvement but I think letting a user change the behavior of
method X by passing true / false to method Y is no good. I think this is kind of error prone
plus its cluttering the seek method though. Once Boolean is enough here. I think we should
rather restrict this to allow users to pull an exactMatchOnly TermsEnum which does only support
exact matches and throws a clear exception if next is called. I know that makes things slightly
harder especially to deal with our ThreadLocal cached TermsEnum instances but I think that
is better here. Can we somehow leave the extra CPU work to the term() call and make this entirely
lazy?


> Optimize TermsEnum.seek when caller doesn't need next term
> ----------------------------------------------------------
>
>                 Key: LUCENE-3225
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3225
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-3225.patch
>
>
> Some codecs are able to save CPU if the caller is only interested in
> exact matches.  EG, Memory codec and SimpleText can do more efficient
> FSTEnum lookup if they know the caller doesn't need to know the term
> following the seek term.
> We have cases like this in Lucene, eg when IW deletes documents by
> Term, if the term is not found in a given segment then it doesn't need
> to know the ceiling term.  Likewise when TermQuery looks up the term
> in each segment.
> I had done this change as part of LUCENE-3030, which is a new terms
> index that's able to save seeking for exact-only lookups, but now that
> we have Memory codec that can also save CPU I think we should commit
> this today.
> The change adds a "boolean onlyExact" param to seek(BytesRef).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message