lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <Le...@seznam.cz>
Subject Re: Where to get stopword lists?
Date Fri, 06 Jun 2003 17:11:20 GMT
Ulrich Mayring wrote:

> Hello,
>
> does anyone know of good stopword lists for use with Lucene? I'm 
> interested in English and German lists.

What does mean ``good''? It depends on your corpus IMHO. The best way, 
how one can get a ``good'' stop-list, is an analysis that's based on 
idf. Thus, index your documents, list all the terms with low idf out, 
save them in a file and use them in next indexing round.

Just a thought...

-g-



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message