lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8306) Allow iteration over the term positions of a Match
Date Sun, 20 May 2018 20:11:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482024#comment-16482024
] 

David Smiley commented on LUCENE-8306:
--------------------------------------

I like TermPostingsEnum.  I wish PostingsEnum simply already had the Term – I've on multiple
occasions had to add extra code (be it just a field or extra abstractions (classes)) to
pair a term with it's PostingsEnum.  Surely I'm not the only one?  I wonder what divorcing
them (what we have today) gains us?  I think the TermPostingsEnum implementation could be less
code by simply extending FilterPostingsEnum.  Oh I see; that's not possible because PostingsEnum
is not an interface.  If TPE is declared to extend FilterPostingsEnum then it'd work though
may feel a little iffy.  Perhaps once upon a time it was reasonable for most of these foundational
classes in Lucene to be abstract classes but with Java 8 default methods on interfaces, I
question that now.

The TermMatchCollector interface looks fine to me.  It gives the implementation more freedom
to implement it efficiently and easier implementation, though it does force the caller to
fully collect instead of iterating at it's leisure, potentially stopping short.  I'm fine
with that.

Another way to handle the requirement of exposing each match is for MatchesIterator itself
to be able to iterate two things – both the spans iteration (what it does now), and the
position iteration.  If it worked this way then I imagine it might as well extend PostingsEnum
(or the TermPostingsEnum here).  That might be quite useful since some MatchesIterator will
effectively simply be a wrapped PostingsEnum.  Another similar alternative is to return the
TermPostingsEnum enumeration from a MatchesIterator to better differentiate what is being
iterated (span vs position).  The best benefit to the approach here (vs your two approaches
thus far) is that we don't need any new abstraction... albeit TermPostingsEnum which sorta
counts but as I said perhaps we can migrate to PostingsEnum exposing the Term?

> Allow iteration over the term positions of a Match
> --------------------------------------------------
>
>                 Key: LUCENE-8306
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8306
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just returns
information about the span of the whole match.  It would be useful to also expose information
about the matching terms within the phrase.  The same would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message