lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www.apertium.org)
Date Sat, 21 Feb 2009 18:22:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675589#action_12675589
] 

Mark Miller commented on LUCENE-1284:
-------------------------------------

Hadn't seen this before. Thanks Felipe! This looks like a high quality contribution.

I've expanded the attached file into contrib and built and ran the tests. Everything went
smooth.

I've only began to look at the code myself, but a couple initial comments:

Could you remove the @author tags? The Lucene project has decided its best to leave them out
(you can search the mailing list if you are interested in the discussion).

How about renaming overview.html to package.html and expanding what you have there? This looks
like a very useful addition, but its complicated enough to merit a more thorough overview
and/or examples of how to get started. Not everyone wades into the contrib packages that often
- lets hook those that do by providing a very clear: "This is what this is, this is what you
can do with it, and here is how you do it". Nothing too intense, but enough to understand
its usefulness quickly (and allow you to gauge the effort required for use).

As an example of seemingly missing info I am wondering about: where do I get the data files?
I see a link to http://www.apertium.org, but digging a bit does not immediately show me what
I am looking for. Clear instructions on how to get going with your preferred morphological
data files would be great (as well as clear instructions on where and how to obtain those
files).

Thanks for donating this code! Its something I have been interested in seeing added to Lucene
for some time.

- Mark

> Set of Java classes that allow the Lucene search engine to use morphological information
developed for the Apertium open-source machine translation platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1284
>             Project: Lucene - Java
>          Issue Type: New Feature
>         Environment: New feature developed under GNU/Linux, but it should work in any
other Java-compliance platform
>            Reporter: Felipe Sánchez Martínez
>            Assignee: Otis Gospodnetic
>         Attachments: apertium-morph.2008-05-19.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological information
developed for the Apertium open-source machine translation platform (http://www.apertium.org).
Morphological information is used to index new documents and to process smarter queries in
which morphological attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for the open-source
machine translation platform Apertium (http://apertium.org) and, optionally, the part-of-speech
taggers developed for it. Currently there are morphological dictionaries available for Spanish,
Catalan, Galician, Portuguese, 
> Aranese, Romanian, French and English. In addition new dictionaries are being developed
for Esperanto, Occitan, Basque, Swedish, Danish, 
> Welsh, Polish and Italian, among others; we hope more language pairs to be added to the
Apertium machine translation platform in the near future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message