lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (LUCENE-2035) TokenSources.getTokenStream() does not assign positionIncrement
Date Sat, 27 Nov 2010 23:25:40 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler resolved LUCENE-2035.
-----------------------------------

    Fix Version/s:     (was: 2.9.4)
                       (was: 3.0.3)
       Resolution: Fixed

Resolving again as this issue will not be backported to 2.9/3.0 branches.

> TokenSources.getTokenStream() does not assign positionIncrement
> ---------------------------------------------------------------
>
>                 Key: LUCENE-2035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2035
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Christopher Morris
>            Assignee: Mark Miller
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2035.patch, LUCENE-2035.patch, LUCENE-2305.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> TokenSources.StoredTokenStream does not assign positionIncrement information. This means
that all tokens in the stream are considered adjacent. This has implications for the phrase
highlighting in QueryScorer when using non-contiguous tokens.
> For example:
> Consider  a token stream that creates tokens for both the stemmed and unstemmed version
of each word - the fox (jump|jumped)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token
stream will be - the fox jump jumped
> Now try a search and highlight for the phrase query "fox jumped". The search will correctly
find the document; the highlighter will fail to highlight the phrase because it thinks that
there is an additional word between "fox" and "jumped". If we use the original (from the analyzer)
token stream then the highlighter works.
> Also, consider the converse - the fox did not jump
> "not" is a stop word and there is an option to increment the position to account for
stop words - (the,0) (fox,1) (did,2) (jump,4)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), the token
stream will be - (the,0) (fox,1) (did,2) (jump,3).
> So the phrase query "did jump" will cause the "did" and "jump" terms in the text "did
not jump" to be highlighted. If we use the original (from the analyzer) token stream then
the highlighter works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message