uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <peter.klu...@averbis.com>
Subject Re: Ruta and Morphology Analyzing
Date Thu, 26 Nov 2015 08:12:30 GMT
Hi,

yes and no.

The UIMA Ruta rules can be interpreted as single FSTs, but they are not
compiled to a single automaton as one would normally expect. There is a
prototypical implementation of such FST but it provides only minimal
functionality of the language. UIMA Ruta is rather an imperative
language and not as much an declarative language since the user is able
to influence/has to take care about the execution logic. In my
experience (also in larger industrial projects), this was not yet an
impediment.

There is an TRIE-like implementation for dictionary lookup in UIMA Ruta:
the wordlists (twl&mtwl) and wordtables. They operate not on tokens but
on RutaBasic annotations which means that you can apply the dictionary
on subtoken spans. You need however some segmentation logic before.

So, a summarizing answer would be:
Yes, you can probably use some sort of TRIE and rules of UIMA Ruta for
your task, but both are not compiled into an FST. Only the simple
dictionaries are transformed into an improved data structure.

Best,

Peter

Am 24.11.2015 um 17:45 schrieb d.heidarpour:
>  
>
> Hi, 
>
> I'm trying to implement a Morphology Analyzer (AE) for Farsi in UIMA. I
> need a way to compile my words list and rules so it can be queried by
> the AE for both bottom-up and top-down morphology analyzing of Farsi
> words. There are a few FST libraries in Java for this task. But my
> question is Can I use UIMA Ruta straightforwardly? or Can I use it in a
> way to compile the words and rules in a structure like Trie? 
>
> Thanks 
>
> ~Davood Heidarpour 
>  


Mime
View raw message