lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Polites" <>
Subject Stop words in index
Date Sat, 02 Sep 2006 13:05:27 GMT
Hey all,

I am using the StandardAnalyzer with my own list of stop words (which is
more comprehensive than the default list), and my expectation was that this
would omit these stop words from the index when data is indexed using this
analyzer.  However, I am seeing stop words in the term vector for documents
indexed with this analyzer.

Is this expected behaviour?  Is there any way I can force these stop words
to be omitted from the index?  Having them in the index is wreaking havoc
with term vector analysis to determine document similarity.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message