lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3849) position increments should be implemented by TokenStream.end()
Date Mon, 20 Aug 2012 21:31:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438234#comment-13438234
] 

Robert Muir commented on LUCENE-3849:
-------------------------------------

{quote}
In addition we should also check all Tokenizers to set the the correct endOffset (end of stream).
{quote}

This is checked by BaseTokenStream test already. its just that currently offsetAtt is the
only thing that we consume from end(), and all tokenizers effectively "overwrite" it with
the correct values. So analyzers tests already pass, the only buggy one was the built-in keywordtokenizer
in StringField.

                
> position increments should be implemented by TokenStream.end()
> --------------------------------------------------------------
>
>                 Key: LUCENE-3849
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3849
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 3.6, 4.0-ALPHA
>            Reporter: Robert Muir
>         Attachments: LUCENE-3849.patch
>
>
> if you have pages of a book as multivalued fields, with the default position increment
gap
> of analyzer.java (0), phrase queries won't work across pages if one ends with stopword(s).
> This is because the 'trailing holes' are not taken into account in end(). So I think
in
> TokenStream.end(), subclasses of FilteringTokenFilter (e.g. stopfilter) should do:
> {code}
> super.end();
> posIncAtt += skippedPositions;
> {code}
> One problem is that these filters need to 'add' to the posinc, but currently nothing
clears
> the attributes for end() [they are dirty, except offset which is set by the tokenizer].
> Also the indexer should be changed to pull posIncAtt from end().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message