lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hg...@cswebmail.com
Subject Fwd: Re: more rigid stopword list ?
Date Fri, 23 Apr 2004 09:31:17 GMT
Apologies for posting it accidentally on the dev list...

------- Start of forwarded message -------

Subject: Re: more rigid stopword list ?
From: hgadm@cswebmail.com
Date: Fri, 23 Apr 2004 02:29:09 -0700 (PDT)
To: lucene-dev@jakarta.apache.org

Many thanks to Tate, Otis, Eric [again :-)] and David.

I am using the Snowball stemmer - so with the
overloaded constructor for Snowball I guess a call
would be:

new SnowballAnalyser("English,
StopAnalyzer.MY_ENGLISH_STOP_WORDS);

where MY_ENGLISH_STOP_WORDS is a java.lang.String[] of
the stopwords I would like to use.

Is that the correct syntax for SnowballAnalyser ?

Thanks again,

Holger


On Thu, 22 Apr 2004 11:38:13 -0700, David Spencer wrote:

> 
> hgadm@cswebmail.com wrote:
> 
> > Dear all,
> > 
> > for my taste the stopwords included in Lucene (e.g.
> > StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually
used
> > with the SnowballAnalyzer - and I guess also with
the
> > StandardAnalyzer) is not strict enough:
> > 
> > For example in a sentence with "we need ..." I would
> > consider "we" and "need" as stopwords but they are
not
> > stripped by SnowballAnalyzer or StandardAnalyzer. 
> > 
> > Now:
> > Is there an in-built solution to use more
restrictive
> > stripping or do I better create my own analyzer in
> that
> > case with a more restrictive stopword list ?
> > 
> > If so - are you aware of more rigid lists ? (a URI
> > would be great !)
> 
> Have you seen this:
> 
>
http://www.onjava.com/onjava/2003/01/15/examples/EnglishStopWords.txt
> 
> Though personally I would start with the default
> assumption that stop 
> word lists are not needed at all unless you can
"prove"
> you need it e.g.
> [1] the indexes are too big (though in theory this
> shouldn't happen 
> because of stop words..)
> [2] you're doing some index analysis where you
traverse
> terms and there 
> are just too many
> 
> 
> 
> > 
> > Thanks,
> > 
> > Holger
> > 
> > ___________________________________________________
> > The ALL NEW CS2000 from CompuServe
> >  Better!  Faster! More Powerful!
> >  250 FREE hours! Sign-on Now!
> >  http://www.compuserve.com/trycsrv/cs2000/webmail/
> > 
> > 
> > 
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> > 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org

___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/

------- End of forwarded message -------

___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message