lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 23730] New: - Token positioning disallows phrase matching across stopwords
Date Fri, 10 Oct 2003 15:41:16 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23730

Token positioning disallows phrase matching across stopwords

           Summary: Token positioning disallows phrase matching across
                    stopwords
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: All
               URL: http://www.mail-archive.com/lucene-
                    user@jakarta.apache.org/msg04349.html
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: sarowe@syr.edu


The URL I gave is to an archived Lucene-User mailing list post, in which a new
user describes surprise at phrase queries succeeding when stopwords appear
between phrase tokens in the original text.

I think that the default StopFilter.java implementation should implement the
position adjusting behavior described in the Lucene API docs:
<URL:http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Token.html#setPositionIncrement(int)>
"Set [the position increment] to values greater than one to inhibit exact phrase
matches. If, for example, one does not want phrases to match across removed stop
words, then one could build a stop word filter that removes stop words and also
sets the increment to the number of stop words removed before each non-stop
word. Then exact phrase queries will only match when the terms occur with no
intervening stop words."

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message