lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <>
Subject [jira] Commented: (LUCENE-902) Check on PositionIncrement with StopFilter.
Date Mon, 04 Jun 2007 15:38:36 GMT


Steven Rowe commented on LUCENE-902:

Hi Toru,

I looked at your patch (though I didn't test it), and I noticed that it uses generics and
varargs, both Java 1.5 features.  Lucene core targets Java 1.4, so your patch needs to be
rewritten to use only Java 1.4 features.

I think I understand what you're going for (filtering out all tokens at the same position
as a stopword), and I think it's a useful addition to Lucene, since the naive "fix", i.e.
employing a StopFilter in a processing pipeline before a morphological analyzer, will negatively
impact the morphological analyzer's performance.  

However, this behavior should not be the default - StopFilter's current behavior is well-defined
and depended on by lots of people.  I think there are (at least :) ) two possible courses
of action here:

1. Include a getter/setter for a boolean field controlling whether to filter out tokens at
the same position as stopwords (call it, say,  "removeStopwordCollocates", where I mean "collocate",
as a noun, to denote tokens with the same position).  This field would be initialized to false,
to preserve existing behavior.

2. Change StopFilter to allow extension (remove the "final" in "public final class StopFilter
..."), and create a new class extending StopFilter that exhibits the behavior you want.  This
could start life in the sandbox.

I like option #1 better - this functionality, were it available, would quite likely be useful
to a significat proportion of Lucene's user base (albeit skewed toward non-Lucene-as-black-box

> Check on PositionIncrement  with StopFilter.
> --------------------------------------------
>                 Key: LUCENE-902
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Toru Matsuzawa
>         Attachments: stopfilter.patch, stopfilter20070604.patch
> PositionIncrement set with Tokenizer is not considered with StopFilter. 
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However, when PositionIncrement
of Token following afterwards is 0, it is not deleted. 
> I think that it is necessary to be deleted. Because it is thought same Token when PositionIncrement
is 0.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message