lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <>
Subject [jira] Commented: (LUCENE-2298) Polish Analyzer
Date Mon, 22 Mar 2010 08:18:27 GMT


Dawid Weiss commented on LUCENE-2298:

Staszek suggested that perhaps it would be convenient if this patch detected if another Polish
stemming library for Polish is present in classpath and if so, used it. The library in mind
is "morfologik-stemming", here:

The code of this library is BSD-licensed and consists mainly of traversal of FSA automata.
The stemmer is dictionary based, so it is nearly (ambiguities) 100% accurate for words in
the dictionary and 0% accurate for non-dictionary words (returns null).

The problem with Morfologik is that its dictionary data is LGPL-ed, so it would have to be
a separate download. 

This is just a suggestion for discussion. I guess this functionality is limited to a very
narrow audience anyway.

> Polish Analyzer
> ---------------
>                 Key: LUCENE-2298
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1
>         Attachments: LUCENE-2298.patch, stemmer_20000.7z
> Andrzej Bialecki has written a Polish stemmer and provided stemming tables for it under
Apache License.
> You can read more about it here:
> In reality, the stemmer is general code and we could use it for more languages too perhaps.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message