nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hannu Väisänen <hvais...@joyx.joensuu.fi>
Subject Malaga-fi - Finnish plugin for Nutch - a new version
Date Thu, 03 Sep 2009 12:48:38 GMT
I have released a new version of malaga-fi.

Changes from previous version: malaga-fi recognizes some
common spelling errors.



Malaga-fi is a Nutch plugin for indexing documents written in Finnish.


Malaga-fi analyses words morphologically, converts them to a base form
(that you find in dictionaries) and indexes the base forms, so that
you find all inflections of a word by just searching for the base
form.

To use an English example, if you search for the word "give" you find
all documents that have "give", "gives", "gave", "given", or "giving".

This is very important in Finnish since Finnish words have literally
tens of thousands of inflected forms.


What you need:

1. Malaga programming language.
   http://home.arcor.de/bjoern-beutel/malaga/


2. Suomimalaga - Description of Finnish morphology written in Malaga.
   http://sourceforge.net/project/showfiles.php?group_id=156731

   Newest version:
   svn co https://voikko.svn.sourceforge.net/svnroot/voikko/trunk/suomimalaga


3. Malaga-Java - Java interface to Malaga.
   http://joyds1.joensuu.fi/programs/index.html

   Malaga-Java has two versions; both are in the same file.
   You need the thread-safe version.


4. Malaga-fi - Nutch plugin for documents written in Finnish.
   http://joyds1.joensuu.fi/programs/index.html


5. Nutch: http://lucene.apache.org/nutch/


Malaga-fi is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Mime
View raw message