lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: German decompounding/tokenization with Lucene?
Date Sat, 16 Sep 2017 07:41:32 GMT
+1, some time ago I also used the decompounder mentioned by Dawid and was
satisfied back then.

Regards,
Tommaso


Il giorno sab 16 set 2017 alle ore 09:29 Dawid Weiss <dawid.weiss@gmail.com>
ha scritto:

> Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel
> Naber. The quality was not ideal but perhaps better than nothing. Also,
> Daniel works on languagetool.org? They should have something in there.
>
> Dawid
>
> On Sep 16, 2017 1:58 AM, "Michael McCandless" <lucene@mikemccandless.com>
> wrote:
>
> > Hello,
> >
> > I need to index documents with German text in Lucene, and I'm wondering
> how
> > people have done this in the past?
> >
> > Lucene already has a DictionaryCompoundWordTokenFilter ... is this what
> > people use?  Are there good, open-source friendly German dictionaries
> > available?
> >
> > Thanks,
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message