lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: German decompounding/tokenization with Lucene?
Date Sat, 16 Sep 2017 07:29:33 GMT
Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel
Naber. The quality was not ideal but perhaps better than nothing. Also,
Daniel works on languagetool.org? They should have something in there.

Dawid

On Sep 16, 2017 1:58 AM, "Michael McCandless" <lucene@mikemccandless.com>
wrote:

> Hello,
>
> I need to index documents with German text in Lucene, and I'm wondering how
> people have done this in the past?
>
> Lucene already has a DictionaryCompoundWordTokenFilter ... is this what
> people use?  Are there good, open-source friendly German dictionaries
> available?
>
> Thanks,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message