lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toru Matsuzawa (JIRA)" <>
Subject [jira] Commented: (LUCENE-902) Check on PositionIncrement with StopFilter.
Date Fri, 08 Jun 2007 09:14:25 GMT


Toru Matsuzawa commented on LUCENE-902:

Hi Hoss,
Than you your comments.

> 1) in future patches, could you please use 2 spaces instead of tabs?

It consented.

> 2) am i understanding correctly that the primary use case you are trying to address is
>  stop word removal when the stop word has synonyms with a position increment of 0 
> (the expectation being that the synonyms also be removed) ?

Your understanding is correct.
However, a synonym itself might be a stop word. 

>  ... if so, wouldn't the most efficient thing be to do stop word removal before doing

> synonym expansion? (it means having a bigger stop word list - with all the synonyms -

> but that still seems better to me) ... are there other use cases i'm not understanding?
>  i freely admit i don't understand the "Japanese morphological analysis" comment.

It is not realistic to have a stop word list with all the synonyms 
because the morphological engine must understand all the dictionaries to make that list.
(The engine analyzes texts with such dictionaries.)

> 3) i only glanced over the specifics of removeStopwordCollocatesNext() .. 
> but would promoting BufferedTokenStream from Solr simplify the code
>  (it seems to all be about buffering tokens) ...

I think that it becomes more concise if BufferedTokenStream can be used. 

> 4) it would be useful if the test case could clarify not only the expected tokens text

> concatenated together, but also what the expected positions of position increments are

> for the tokens... i was certainly confused by the title of this issue.

I agree with you. It would be better to compare them with expected tokens. 
I'm sorry to confuse you with my poor English.

> Check on PositionIncrement  with StopFilter.
> --------------------------------------------
>                 Key: LUCENE-902
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Toru Matsuzawa
>         Attachments: stopfilter.patch, stopfilter20070604.patch, stopfilter20070605.patch
> PositionIncrement set with Tokenizer is not considered with StopFilter. 
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement
of Token following afterwards is 0, it is not deleted. 
> I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement
is 0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message