lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-902) Check on PositionIncrement with StopFilter.
Date Mon, 04 Jun 2007 02:24:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501080
] 

Hoss Man commented on LUCENE-902:
---------------------------------

without a unit test demonstrating an actual problem, i'm having a hard time udnerstanding
what exactly the "bug" is in this issue.

from what i can tell based on the comments and my reading of the patch, Toru is concerned
about cumulative positionIncrements of tokens being lost when one of those tokens is a stop
word.  (ie: if indexing multiple names of movies in a Document about an actor, and using a
positionIncriment of "10" between each Field value (ie: movie name), indexing the values "Dirty
Harry" and "The Good the bad and the Ugly" could result in no gap between the tokens "harry"
and "good" since "the" is a stop word.

is my understanding of the problem correct?

if so, then i'm not sure how this patch really addresses the problem ... besides the fact
that it treats "1" as a special case (the problem can come up with any positionIncrement)
it doesn't seem to make any allowance for the situation where multiple stop words appear in
sequence.

i'm also not clear on why non stop words immediately following stop words (ie: the "else if(flag)"
case) are not returned unless their positionIncriment is 1.





> Check on PositionIncrement  with StopFilter.
> --------------------------------------------
>
>                 Key: LUCENE-902
>                 URL: https://issues.apache.org/jira/browse/LUCENE-902
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Toru Matsuzawa
>         Attachments: stopfilter.patch
>
>
> PositionIncrement set with Tokenizer is not considered with StopFilter. 
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement
of Token following afterwards is 0, it is not deleted. 
> I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement
is 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message