lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Stop words in index
Date Sat, 02 Sep 2006 16:26:05 GMT
They shouldn't be in the index.  You must be using StandardAnalyzer incorrectly, or maybe you
think you are using it, but are really using something else.

Otis

----- Original Message ----
From: Jason Polites <jason.polites@gmail.com>
To: java-user@lucene.apache.org
Sent: Saturday, September 2, 2006 9:05:27 AM
Subject: Stop words in index

Hey all,

I am using the StandardAnalyzer with my own list of stop words (which is
more comprehensive than the default list), and my expectation was that this
would omit these stop words from the index when data is indexed using this
analyzer.  However, I am seeing stop words in the term vector for documents
indexed with this analyzer.

Is this expected behaviour?  Is there any way I can force these stop words
to be omitted from the index?  Having them in the index is wreaking havoc
with term vector analysis to determine document similarity.

Thanks.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message