lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Twan Kogels <t...@twansoft.com>
Subject Dutch Analyzer dictionary format?
Date Fri, 26 Nov 2004 09:42:04 GMT
Hello all,

I'm using lucene to search through a couple of documents to find 
interesting documents. Most documents are in Dutch language. I saw that the 
default snowball stemmer wasn't doing well on text written in a foreign 
language. Lucky i found a Dutch text analyzer in de lucene sandbox project.

I've read the javadoc and found out it needs a stemdictionary. You can load 
this dictionary with the following function:
DutchAnalyzer.setStemDictionary(File f)

The format needs to be a tab separator list (word [tab] stem).

To be sure i do everything correctly i've got a question about the dictonary:
Can i just get:
<http://snowball.tartarus.org/dutch/diffs.txt>
and convert it to a tab separated list and then "feed" it to the 
setStemDictionary() function?

Kind regards,
Twan Kogels 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message