lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Stop words in index
Date Mon, 04 Sep 2006 06:52:40 GMT
: In the default StandardAnalyzer, the stop word list contains the word "on".
: If I have a document which contains the phrase "Disney on Ice", the index
: will show only "Disney" and "Ice", but not "on".

: "Disney on Ice"
:
: With the quotations indicating the desire for an "exact match", the absence
: of stop words in the index means this yields zero results.
:
: Am I going crazy here?

you should still get a match *if* you use the same analyzer (with the same
set of stop words) in your query parser as you did when you indexed your
documents.  if so, then "Disney on Ice" should become a phrase search for
the two successive tokens "disney" and "ice" which is what should have
cone in your doc when it was indexed using that analyzer.

:
: On 9/3/06, Jason Polites <jason.polites@gmail.com> wrote:
: >
: > Roger that.  I'll double check my code.
: >
: > Thanks.
: >
: >
: > On 9/3/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com > wrote:
: > >
: > > They shouldn't be in the index.  You must be using StandardAnalyzer
: > > incorrectly, or maybe you think you are using it, but are really using
: > > something else.
: > >
: > > Otis
: > >
: > > ----- Original Message ----
: > > From: Jason Polites <jason.polites@gmail.com>
: > > To: java-user@lucene.apache.org
: > > Sent: Saturday, September 2, 2006 9:05:27 AM
: > > Subject: Stop words in index
: > >
: > > Hey all,
: > >
: > > I am using the StandardAnalyzer with my own list of stop words (which is
: > > more comprehensive than the default list), and my expectation was that
: > > this
: > > would omit these stop words from the index when data is indexed using
: > > this
: > > analyzer.  However, I am seeing stop words in the term vector for
: > > documents
: > > indexed with this analyzer.
: > >
: > > Is this expected behaviour?  Is there any way I can force these stop
: > > words
: > > to be omitted from the index?  Having them in the index is wreaking
: > > havoc
: > > with term vector analysis to determine document similarity.
: > >
: > > Thanks.
: > >
: > >
: > >
: > >
: > > ---------------------------------------------------------------------
: > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > > For additional commands, e-mail: java-user-help@lucene.apache.org
: > >
: > >
: >
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message