lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (LUCENE-902) Check on PositionIncrement with StopFilter.
Date Thu, 07 Jun 2007 04:08:25 GMT


Hoss Man commented on LUCENE-902:

A few comments in no particular order...

1) in future patches, could you please use 2 spaces instead of tabs?

2) am i understanding correctly that the primary use case you are trying to address is stop
word removal when the stop word has synonyms with a position increment of 0 (the expectation
being that the synonyms also be removed) ? ... if so, wouldn't the most efficient thing be
to do stop word removal before doing synonym expansion?  (it means having a bigger stop word
list - with all the synonyms - but that still seems better to me) ... are there other use
cases i'm not understanding? ... i freely admit i don't understand the "Japanese morphological
analysis" comment.

3) i only glanced over the specifics of removeStopwordCollocatesNext() .. but would promoting
BufferedTokenStream from Solr simplify the code (it seems to all be about buffering tokens)

4) it would be useful if the test case could clarify not only the expected tokens text concatenated
together, but also what the expected positions of position increments are for the tokens...
i was certainly confused by the title of this issue.

> Check on PositionIncrement  with StopFilter.
> --------------------------------------------
>                 Key: LUCENE-902
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Toru Matsuzawa
>         Attachments: stopfilter.patch, stopfilter20070604.patch, stopfilter20070605.patch
> PositionIncrement set with Tokenizer is not considered with StopFilter. 
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement
of Token following afterwards is 0, it is not deleted. 
> I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement
is 0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message