lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Barry <MBa...@cos.com>
Subject Re: QueryParser, phrases and stopwords
Date Thu, 16 Jun 2005 12:57:08 GMT
Erik,
   Thanks, I applied the changes found in  version 150148 of StopFilter.java
and they work great for me. I did remove the setting of position=1 before
the return of the token since that seemed spurious to me. Here's a context
diff of the current StopFilter.java and my changes:

*** analysis/StopFilter.java.old        Thu Jun 16 07:42:28 2005
--- analysis/StopFilter.java    Thu Jun 16 08:44:50 2005
***************
*** 94,109 ****
     * Returns the next input Token whose termText() is not a stop word.
     */
    public final Token next() throws IOException {
-     int position = 1;
-
      // return the first non-stop word found
!     for (Token token = input.next(); token != null; token =
input.next()) {
!       if (!stopWords.contains(token.termText)) {
!         token.setPositionIncrement( position );
          return token;
-       }
-       position++;
-     }
      // reached EOS -- return null
      return null;
    }
--- 94,103 ----
     * Returns the next input Token whose termText() is not a stop word.
     */
    public final Token next() throws IOException {
      // return the first non-stop word found
!     for (Token token = input.next(); token != null; token = input.next())
!       if (!stopWords.contains(token.termText))
          return token;
      // reached EOS -- return null
      return null;
    }




Erik Hatcher wrote:

>
> On Jun 15, 2005, at 12:12 PM, Mike Barry wrote:
>
>> I have a situation where a query such as "climate control" is  returning
>> documents with the phrase "climate of control".  (I'm using 
>> QueryParser).
>>
>> After searching, I found  the similar issue on the mailing list from
>> Greg Robertson
>> with a patch from Steve Rowe.
>>
>> Looking at the source repository for StopFilter.java, the patch was 
>> applied
>> in November of 2003 and then reverted in Dec 2003 (by Erik), with 
>> the note:
>>
>> revert position increment change due to conflict with PhraseQuery
>>
>> (the patch  incremented the token position to inhibit exact  matching
>> across
>> removed stopword(s)).
>>
>> I couldn't find any info on how/why this approach conflicted with
>> PhraseQuery.
>> Can anyone elighten me on this? Does anyone know of a way to inhibit
>> exact matching across removed stopwords(s)?
>
>
> PhraseQuery originally did not account for gaps left in the terms of 
> the phrase.
>
> PhraseQuery was modified last year to allow for this though:
>
> r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5 
> lines
>
> PhraseQuery and PhrasePrefixQuery are extended. It's now
> possible to specify the relative position of a term within
> a phrase. This allows gaps and multiple terms at the same
> position.
> -----
>
> So we could change StopFilter to put the gaps back in safely now, I 
> think.
>
> Thoughts?
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message