lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tate Avery <tate.av...@nstein.com>
Subject RE: more rigid stopword list ?
Date Thu, 22 Apr 2004 16:57:32 GMT

It doesn't have 'need' but it has plenty of others....

http://snowball.tartarus.org/english/stop.txt


-----Original Message-----
From: hgadm@cswebmail.com [mailto:hgadm@cswebmail.com]
Sent: Thursday, April 22, 2004 12:55 PM
To: lucene-dev@jakarta.apache.org
Subject: more rigid stopword list ?


Dear all,

for my taste the stopwords included in Lucene (e.g.
StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
with the SnowballAnalyzer - and I guess also with the
StandardAnalyzer) is not strict enough:

For example in a sentence with "we need ..." I would
consider "we" and "need" as stopwords but they are not
stripped by SnowballAnalyzer or StandardAnalyzer. 

Now:
Is there an in-built solution to use more restrictive
stripping or do I better create my own analyzer in that
case with a more restrictive stopword list ?

If so - are you aware of more rigid lists ? (a URI
would be great !)

Thanks,

Holger


___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message