lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Where to find non-English dictionaries, thesaurus, synonyms
Date Fri, 07 Jan 2011 21:26:37 GMT
On Thu, Jan 6, 2011 at 11:53 AM, Pulkit Singhal <pulkitsinghal@gmail.com> wrote:
> Hello,
>
> What's a good source to get dictionaries (for spellcorrections) and/or
> thesaurus (for synonyms) that can be used with Lucene for non-English
> languages such as Fresh, Chinese, Korean etc?

if you can't find a wordlist of correctly-spelled words somewhere
else, you can always try
http://wiki.services.openoffice.org/wiki/Dictionaries, grab the
openoffice spellchecker dictionary for that language, and use the
hunspell "unmunch" command (sort of like morphological generation) to
generate a list of words you could then use with PlainTextDictionary.

>
> For example, the wordnet contrib module is based on the data set
> provided by the Princeton based wordnet system but I'm wondering where
> the Lucene users go for similar reliable source for other languages?
>

in this case i would also investigate the openoffice thesaurus data,
if you cant find anything else.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message