lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Stop words (how to create ideal set of stop words?)
Date Fri, 11 May 2007 02:24:00 GMT
There is a handy class in contrib/misc.../ that will show you the most frequent terms in an
index. Handy dandy.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Lukas Vlcek <lukas.vlcek@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, May 10, 2007 2:39:35 PM
Subject: Stop words (how to create ideal set of stop words?)

Hi,

Can anybody point me to some references how to create an ideal set of stop
words? I konw that this is more like a theoretical question but how do
Luceners determine which words shuold be excluded when creating Analyzers
for a new languages? And which technique was used for validation of stop
word lists in current Analyzers?

More specificaly I am interested in situations when there is a need to build
a search engine around specific corpus (for example when we need to search
set of articles related to programming languages only). Given a specific
corpus is there any recommended technique of stop words derivation?

Thanks,
Lukas




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message