lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: default text type and stop words
Date Tue, 06 Nov 2007 12:51:14 GMT
Another alternative that is to selectively use stopwords as in phrases  
or other places where they have meaning.  In the past, stopword  
removal was mostly done to save disk space and some computation, but  
disk is cheap and computation, well, they can help you have better  
results if done right, so the computation cost may be worth it.  If  
they truly were meaningless, why would they be in the language to  
begin with? :-)

-Grant

On Nov 6, 2007, at 1:36 AM, Walter Underwood wrote:

> I also said, "Stopword removal is a reasonable default because it  
> works
> fairly well for a general text corpus." Ultraseek keeps stopwords but
> most engines don't. I think it is fine as a default. I also think you
> have to understand stopwords at some point.
>
> wunder
>
> On 11/5/07 9:59 PM, "Chris Hostetter" <hossman_lucene@fucit.org>  
> wrote:
>
>>
>> : This isn't a problem in Lucene or Solr. It is a result of the  
>> analyzers
>> : you have chosen to use. If you choose to remove stopwords, you  
>> will not
>> : be able to match stopwords.
>>
>> I believe paul's point was that this use of stopwords is in the  
>> "text"
>> fieldtype in the example schema.xml ... which many people use as is.
>>
>> I'm personally of the mindset that it's fine like it is.  While  
>> people who
>> understand that "an" is a stop word might ask "why does 'rating:PG  
>> AND
>> name:an' match 40K movies, it should match 0?" there is another  
>> (probably
>> larger) group of people who won't know how the search is  
>> implemented, or
>> that "an" is a stop word, and they will look at the same results  
>> and ask
>> "why am i getting 40K results? most of these don't have 'an' in the  
>> title?
>> i should only be getting X results."
>>
>> That second group of people aren't going to be any happier if you
>> give them 0 results instead -- at least this way people get some  
>> results
>> to work with.
>>
>> -Hoss
>
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



Mime
View raw message