lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: default text type and stop words
Date Tue, 06 Nov 2007 06:36:23 GMT
I also said, "Stopword removal is a reasonable default because it works
fairly well for a general text corpus." Ultraseek keeps stopwords but
most engines don't. I think it is fine as a default. I also think you
have to understand stopwords at some point.

wunder

On 11/5/07 9:59 PM, "Chris Hostetter" <hossman_lucene@fucit.org> wrote:

> 
> : This isn't a problem in Lucene or Solr. It is a result of the analyzers
> : you have chosen to use. If you choose to remove stopwords, you will not
> : be able to match stopwords.
> 
> I believe paul's point was that this use of stopwords is in the "text"
> fieldtype in the example schema.xml ... which many people use as is.
> 
> I'm personally of the mindset that it's fine like it is.  While people who
> understand that "an" is a stop word might ask "why does 'rating:PG AND
> name:an' match 40K movies, it should match 0?" there is another (probably
> larger) group of people who won't know how the search is implemented, or
> that "an" is a stop word, and they will look at the same results and ask
> "why am i getting 40K results? most of these don't have 'an' in the title?
> i should only be getting X results."
> 
> That second group of people aren't going to be any happier if you
> give them 0 results instead -- at least this way people get some results
> to work with.
> 
> -Hoss



Mime
View raw message