lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2341) explore morfologik integration
Date Tue, 28 Jun 2011 10:50:17 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056442#comment-13056442
] 

Dawid Weiss commented on LUCENE-2341:
-------------------------------------

I've cleaned up the patch, but I'd still address the two TODOs that I left in the code:

- lowercasing should be done not at the external filter level, but inside the filter as a
fallback IF AND ONLY IF the original sequence is not found in the dictionary. Morfeusz and
Morfologik do have uppercase surface forms and do treat them differently (returning uppercase
lemmas, for example). A test for this would be nice as well. An example of an uppercase/mixed
surface form: AGD, Aaron, Poznania.

- I'd expose another attribute with morphosyntactic annotations -- this is something that
is there anyway, so why not expose it.

I attached a git diff, but it should apply with patch -p1 < ... too. MichaƂ, will you
have the time to polish this off?

> explore morfologik integration
> ------------------------------
>
>                 Key: LUCENE-2341
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2341
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Dawid Weiss
>         Attachments: LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.diff, LUCENE-2341.patch,
morfologik-fsa-1.5.2.jar, morfologik-polish-1.5.2.jar, morfologik-stemming-1.5.2.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option for users.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message