lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-...@tropo.com>
Subject Re: more rigid stopword list ?
Date Thu, 22 Apr 2004 18:38:13 GMT
hgadm@cswebmail.com wrote:

> Dear all,
> 
> for my taste the stopwords included in Lucene (e.g.
> StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
> with the SnowballAnalyzer - and I guess also with the
> StandardAnalyzer) is not strict enough:
> 
> For example in a sentence with "we need ..." I would
> consider "we" and "need" as stopwords but they are not
> stripped by SnowballAnalyzer or StandardAnalyzer. 
> 
> Now:
> Is there an in-built solution to use more restrictive
> stripping or do I better create my own analyzer in that
> case with a more restrictive stopword list ?
> 
> If so - are you aware of more rigid lists ? (a URI
> would be great !)

Have you seen this:

http://www.onjava.com/onjava/2003/01/15/examples/EnglishStopWords.txt

Though personally I would start with the default assumption that stop 
word lists are not needed at all unless you can "prove" you need it e.g.
[1] the indexes are too big (though in theory this shouldn't happen 
because of stop words..)
[2] you're doing some index analysis where you traverse terms and there 
are just too many



> 
> Thanks,
> 
> Holger
> 
> ___________________________________________________
> The ALL NEW CS2000 from CompuServe
>  Better!  Faster! More Powerful!
>  250 FREE hours! Sign-on Now!
>  http://www.compuserve.com/trycsrv/cs2000/webmail/
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message