lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe Sánchez Martínez (JIRA) <j...@apache.org>
Subject [jira] Commented: (LUCENE-1284) Set of Java classes that allow the Lucene search engine to use morphological information developed for the Apertium open-source machine translation platform (http://www.apertium.org)
Date Tue, 14 Apr 2009 10:47:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698732#action_12698732
] 

Felipe Sánchez Martínez commented on LUCENE-1284:
-------------------------------------------------

Hi Otis,

The Java code I contributed is ASL and GPLv2  (dual license). Apertium tools and data are
GPL v2.


>  Why are they in pairs? Is that simply for the translation part of Apertium, and  something
that's ignored when you use the pair for Lucene and morphological analysis?

Yes, they are language pairs because of the translation. If you are not interested in translation
(as is our case) you can used whichever language pair containing the language you are interested
in; choose the language pair with the highest number of lemmata, probably the one with the
highest version number.

> Do you mind replacing the deprecated Hits object in the Searcher class?

Which is the new class I should use?

> Could you explain why the removal of multiword expressions is needed?

Multiword units need to be removed from the dictionary mainly because they are there to facilitate
the correct translation of some expressions to the target language. This is not Spanish specific
and should be done in all cases.


> So these are a few command-line tools that end up marking up the input text with POS?


Yes. 

> I seem to be missing some libraries and can't compile Apterium locally to check what
that this marked up file looks like.

You need to install lttoolbox,  you can download it from the Apertium web page.

> But my main question here is whether there are Java equivalents of these command-line
tools,

Unfortunately, no :(

Regards.
--
Felipe

> Set of Java classes that allow the Lucene search engine to use morphological information
developed for the Apertium open-source machine translation platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1284
>             Project: Lucene - Java
>          Issue Type: New Feature
>         Environment: New feature developed under GNU/Linux, but it should work in any
other Java-compliance platform
>            Reporter: Felipe Sánchez Martínez
>            Assignee: Otis Gospodnetic
>         Attachments: apertium-morph.0.9.0.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological information
developed for the Apertium open-source machine translation platform (http://www.apertium.org).
Morphological information is used to index new documents and to process smarter queries in
which morphological attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for the open-source
machine translation platform Apertium (http://apertium.org) and, optionally, the part-of-speech
taggers developed for it. Currently there are morphological dictionaries available for Spanish,
Catalan, Galician, Portuguese, 
> Aranese, Romanian, French and English. In addition new dictionaries are being developed
for Esperanto, Occitan, Basque, Swedish, Danish, 
> Welsh, Polish and Italian, among others; we hope more language pairs to be added to the
Apertium machine translation platform in the near future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message