lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe Sánchez Martínez (JIRA) <j...@apache.org>
Subject [jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www.apertium.org)
Date Tue, 28 Apr 2009 15:10:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703670#action_12703670
] 

Felipe Sánchez Martínez commented on LUCENE-1284:
-------------------------------------------------

Hi, 

I think that the fact that the tool relies on an external free/open-source package to pre-process
the files to be indexed should not be an obstacle for the community to benefit from them;
the world is pretty heterogeneous ;). Furthermore, they are not required at search time. 

> Felipe, although Java equivalents of those command-line tools don't exist currently,
do you think one could implement them in Java (and release them under ASL)? 

This year the Apertium project is in the Google Summer of Code. A student will port the ltoolbox
package to Java. Note that the tool I contribute also uses the apertium tagger and that this
tool will not be ported; fortunately the usage of the tagger is optional.  The Java version
of lttoolbox will be released under the GPL license, I am not sure if they will accept to
give it a dual license.

--
Felipe

> Set of Java classes that allow the Lucene search engine to use morphological information
developed for the Apertium open-source machine translation platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1284
>             Project: Lucene - Java
>          Issue Type: New Feature
>         Environment: New feature developed under GNU/Linux, but it should work in any
other Java-compliance platform
>            Reporter: Felipe Sánchez Martínez
>            Assignee: Otis Gospodnetic
>         Attachments: apertium-morph.0.9.0.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological information
developed for the Apertium open-source machine translation platform (http://www.apertium.org).
Morphological information is used to index new documents and to process smarter queries in
which morphological attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for the open-source
machine translation platform Apertium (http://apertium.org) and, optionally, the part-of-speech
taggers developed for it. Currently there are morphological dictionaries available for Spanish,
Catalan, Galician, Portuguese, 
> Aranese, Romanian, French and English. In addition new dictionaries are being developed
for Esperanto, Occitan, Basque, Swedish, Danish, 
> Welsh, Polish and Italian, among others; we hope more language pairs to be added to the
Apertium machine translation platform in the near future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message