lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@hoplahup.net>
Subject Re: Where to find non-English dictionaries, thesaurus, synonyms
Date Fri, 07 Jan 2011 21:35:46 GMT
Somehow, I had the impression that the TrebleCLEF and EuroMatrix european projects are meant
to gather this kind of information sources.

But honestly, it's not as homogeneous as in OpenOffice.
Mozilla also has dictionaries.
Wiktionary can also be helpful.

paul


Le 7 janv. 2011 à 22:26, Robert Muir a écrit :

> On Thu, Jan 6, 2011 at 11:53 AM, Pulkit Singhal <pulkitsinghal@gmail.com> wrote:
>> Hello,
>> 
>> What's a good source to get dictionaries (for spellcorrections) and/or
>> thesaurus (for synonyms) that can be used with Lucene for non-English
>> languages such as Fresh, Chinese, Korean etc?
> 
> if you can't find a wordlist of correctly-spelled words somewhere
> else, you can always try
> http://wiki.services.openoffice.org/wiki/Dictionaries, grab the
> openoffice spellchecker dictionary for that language, and use the
> hunspell "unmunch" command (sort of like morphological generation) to
> generate a list of words you could then use with PlainTextDictionary.
> 
>> 
>> For example, the wordnet contrib module is based on the data set
>> provided by the Princeton based wordnet system but I'm wondering where
>> the Lucene users go for similar reliable source for other languages?
>> 
> 
> in this case i would also investigate the openoffice thesaurus data,
> if you cant find anything else.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message