lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <galam...@com-os2.ms.mff.cuni.cz>
Subject RE: Analyzer stopwords
Date Sun, 13 Apr 2003 20:54:19 GMT
Why do you use stop words? IMHO it is very problematic, 'cuz:

1. you must handle word forms ("be" can be "are", "is", "was" ...) and
Lucene's stemmer cannot help you with it. Moreover the stemmer uses number
of endsWith() operations which are not effective.

2. F-measure goes down.

3. many queries are empty or they return bad results, i.e. "to be or not 
to be"

-g-


On Fri, 11 Apr 2003, Eric Isakson wrote:

> Sounds reasonable if that is ok for your app.
> 
> Consider a simple example:
> When indexing you list 'house' as a stop word.
> When searching you do not list 'house' as a stop word.
> 
> You indexed the string "my house is blue"
> 
> You search for:
> +blue -house
> 
> Consequence:
> You still get a hit on "my house is blue" because "house" was not indexed.
> 
> Eric
> 
> -----Original Message-----
> From: Armbrust, Daniel C. [mailto:Armbrust.Daniel@mayo.edu] 
> Sent: Friday, April 11, 2003 12:00 PM
> To: 'Lucene Users List'
> Subject: RE: Analyzer stopwords
> 
> 
> So I should be able to safely use a list of stopwords while creating an index, and then
not pass in any stopwords when searching the index, and get the same behavior as if I had
passed in the list of stopwords - Since they weren't put into the index to begin with, they
are impossible to search with regardless of the stopwords that I use for searching.
> 
> 
> 
> 
> ************************************ 
> 
> -----Original Message-----
> From: Eric Isakson [mailto:Eric.Isakson@sas.com] 
> Sent: Friday, April 11, 2003 10:44 AM
> To: Lucene Users List
> Subject: RE: Analyzer stopwords
> 
> 
> I think Otis's example was backwards...
> 
> Consider a simple example:
> When indexing you list 'house' as a stop word.
> When searching you do not list 'house' as a stop word.
> 
> Consequence:
> You did _not_ index 'house' terms, but you cannot find any documents if you search for
'house'.
> 
> The inverse is more interesting to make the point:
> When indexing you do not list 'house' as a stop word.
> When searching you list 'house' as a stop word.
> 
> Consequence:
> You did index 'house' terms, but you cannot find any documents if you search for 'house'
(because it was removed from the query). Your index is bloated with terms that can never be
successfully searched.
> 
> Eric
> 
> -----Original Message-----
> From: Armbrust, Daniel C. [mailto:Armbrust.Daniel@mayo.edu] 
> Sent: Friday, April 11, 2003 11:38 AM
> To: 'Lucene Users List'
> Subject: RE: Analyzer stopwords
> 
> 
> I'm missing something obvious, I know it.... But, if I list 'house' as a stop word when
I am indexing, doesn't that mean that any occurrence of the word house is not indexed, and
therefore not findable in the index?  So it wouldn't matter if 'house' was in my stopwords
list when searching, because its not findable anyway?
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Friday, April 11, 2003 10:13 AM
> To: Lucene Users List
> Subject: Re: Analyzer stopwords
> 
> 
> Yes.
> 
> Consider a simple example:
> When indexing you list 'house' as a stop word.
> When searching you do not list 'house' as a stop word.
> 
> Consequence:
> You did index 'house' terms, but you cannot find any documents if you search for 'house'.
> 
> Otis
> 
> 
> --- "Armbrust, Daniel C." <Armbrust.Daniel@mayo.edu> wrote:
> > If I use a StandardAnalyzer for indexing, is it important to provide
> > it with the same stop words list for searching, as I used for 
> > indexing?
> > 
> > Thanks,
> > 
> > Dan
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > 
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message