lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Polites" <jason.poli...@gmail.com>
Subject Re: Stop words in index
Date Mon, 04 Sep 2006 07:17:17 GMT
right ok.  I may have been tinkering with the analyzer between searches.

Thanks

On 9/4/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
> : In the default StandardAnalyzer, the stop word list contains the word
> "on".
> : If I have a document which contains the phrase "Disney on Ice", the
> index
> : will show only "Disney" and "Ice", but not "on".
>
> : "Disney on Ice"
> :
> : With the quotations indicating the desire for an "exact match", the
> absence
> : of stop words in the index means this yields zero results.
> :
> : Am I going crazy here?
>
> you should still get a match *if* you use the same analyzer (with the same
> set of stop words) in your query parser as you did when you indexed your
> documents.  if so, then "Disney on Ice" should become a phrase search for
> the two successive tokens "disney" and "ice" which is what should have
> cone in your doc when it was indexed using that analyzer.
>
> :
> : On 9/3/06, Jason Polites <jason.polites@gmail.com> wrote:
> : >
> : > Roger that.  I'll double check my code.
> : >
> : > Thanks.
> : >
> : >
> : > On 9/3/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com > wrote:
> : > >
> : > > They shouldn't be in the index.  You must be using StandardAnalyzer
> : > > incorrectly, or maybe you think you are using it, but are really
> using
> : > > something else.
> : > >
> : > > Otis
> : > >
> : > > ----- Original Message ----
> : > > From: Jason Polites <jason.polites@gmail.com>
> : > > To: java-user@lucene.apache.org
> : > > Sent: Saturday, September 2, 2006 9:05:27 AM
> : > > Subject: Stop words in index
> : > >
> : > > Hey all,
> : > >
> : > > I am using the StandardAnalyzer with my own list of stop words
> (which is
> : > > more comprehensive than the default list), and my expectation was
> that
> : > > this
> : > > would omit these stop words from the index when data is indexed
> using
> : > > this
> : > > analyzer.  However, I am seeing stop words in the term vector for
> : > > documents
> : > > indexed with this analyzer.
> : > >
> : > > Is this expected behaviour?  Is there any way I can force these stop
> : > > words
> : > > to be omitted from the index?  Having them in the index is wreaking
> : > > havoc
> : > > with term vector analysis to determine document similarity.
> : > >
> : > > Thanks.
> : > >
> : > >
> : > >
> : > >
> : > >
> ---------------------------------------------------------------------
> : > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > > For additional commands, e-mail: java-user-help@lucene.apache.org
> : > >
> : > >
> : >
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message