lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: QueryParser, phrases and stopwords
Date Thu, 16 Jun 2005 14:23:07 GMT
Are there any other issues or concerns with making this change to  
StopFilter?  Should we make this change in 1.9?  Or wait until after  
2.0 is released?

Mike - if you could create some test cases for this scenario and  
contribute your patch and tests to Bugzilla, barring no objections,  
I'll apply it.

     Erik


On Jun 16, 2005, at 8:57 AM, Mike Barry wrote:

> Erik,
>    Thanks, I applied the changes found in  version 150148 of  
> StopFilter.java
> and they work great for me. I did remove the setting of position=1  
> before
> the return of the token since that seemed spurious to me. Here's a  
> context
> diff of the current StopFilter.java and my changes:
>
> *** analysis/StopFilter.java.old        Thu Jun 16 07:42:28 2005
> --- analysis/StopFilter.java    Thu Jun 16 08:44:50 2005
> ***************
> *** 94,109 ****
>      * Returns the next input Token whose termText() is not a stop  
> word.
>      */
>     public final Token next() throws IOException {
> -     int position = 1;
> -
>       // return the first non-stop word found
> !     for (Token token = input.next(); token != null; token =
> input.next()) {
> !       if (!stopWords.contains(token.termText)) {
> !         token.setPositionIncrement( position );
>           return token;
> -       }
> -       position++;
> -     }
>       // reached EOS -- return null
>       return null;
>     }
> --- 94,103 ----
>      * Returns the next input Token whose termText() is not a stop  
> word.
>      */
>     public final Token next() throws IOException {
>       // return the first non-stop word found
> !     for (Token token = input.next(); token != null; token =  
> input.next())
> !       if (!stopWords.contains(token.termText))
>           return token;
>       // reached EOS -- return null
>       return null;
>     }
>
>
>
>
> Erik Hatcher wrote:
>
>
>>
>> On Jun 15, 2005, at 12:12 PM, Mike Barry wrote:
>>
>>
>>> I have a situation where a query such as "climate control" is   
>>> returning
>>> documents with the phrase "climate of control".  (I'm using
>>> QueryParser).
>>>
>>> After searching, I found  the similar issue on the mailing list from
>>> Greg Robertson
>>> with a patch from Steve Rowe.
>>>
>>> Looking at the source repository for StopFilter.java, the patch was
>>> applied
>>> in November of 2003 and then reverted in Dec 2003 (by Erik), with
>>> the note:
>>>
>>> revert position increment change due to conflict with PhraseQuery
>>>
>>> (the patch  incremented the token position to inhibit exact   
>>> matching
>>> across
>>> removed stopword(s)).
>>>
>>> I couldn't find any info on how/why this approach conflicted with
>>> PhraseQuery.
>>> Can anyone elighten me on this? Does anyone know of a way to inhibit
>>> exact matching across removed stopwords(s)?
>>>
>>
>>
>> PhraseQuery originally did not account for gaps left in the terms of
>> the phrase.
>>
>> PhraseQuery was modified last year to allow for this though:
>>
>> r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5
>> lines
>>
>> PhraseQuery and PhrasePrefixQuery are extended. It's now
>> possible to specify the relative position of a term within
>> a phrase. This allows gaps and multiple terms at the same
>> position.
>> -----
>>
>> So we could change StopFilter to put the gaps back in safely now, I
>> think.
>>
>> Thoughts?
>>
>>     Erik
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message