lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Vlcek" <>
Subject Stop words (how to create ideal set of stop words?)
Date Thu, 10 May 2007 18:39:35 GMT

Can anybody point me to some references how to create an ideal set of stop
words? I konw that this is more like a theoretical question but how do
Luceners determine which words shuold be excluded when creating Analyzers
for a new languages? And which technique was used for validation of stop
word lists in current Analyzers?

More specificaly I am interested in situations when there is a need to build
a search engine around specific corpus (for example when we need to search
set of articles related to programming languages only). Given a specific
corpus is there any recommended technique of stop words derivation?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message