uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Armin.Weg...@bka.bund.de>
Subject AW: RUTA and shared resources
Date Fri, 23 Jan 2015 07:59:33 GMT
Hi Peter!

Thanks for your help. I will look at it.
At least for now, greedy anchoring and markfast work as expected. But I've used only short
word lists with simple entries.

Cheers,
Armin





-----Urspr├╝ngliche Nachricht-----
Von: Peter Kl├╝gl [mailto:pkluegl@uni-wuerzburg.de] 
Gesendet: Donnerstag, 22. Januar 2015 11:24
An: user@uima.apache.org
Betreff: Re: RUTA and shared resources

Hi,

Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> Hello!
>
> This a very short and simple gazetteer using RUTA.
>
> Document{->GREEDYANCHORING(true)};
> %s*{->MARKFAST(%s,'%s')};

First of all, I am sorry that I was not yet able to implement the greedy
matching for the gazetteers/wordlists. I have not forgotten it.
Just curious: does the rule perform as you expect/intend? I mean the
combination of greedy anchoring and the windowed stream caused by the
matching condition.


>
> where the first %s is replaced using String.format() by the name of
the source type, the second %s is replaced by the target type name, and
the third %s is replaced by the URL of a word list. Doing so, it's a
little bit for flexible. This is done once in
CasAnnotator_ImplBase.initialize().
>
> Then the script is executed with Ruta.apply(cas, script) in process().
But that means that the word list is read again for every CAS processed.
Is there any way to have RUTA use the word list as a
SharedResourceObject, so that it is read once only?

The problem is that Ruta.apply() creates a new descriptor and a new
analysis engine. You could integrate the ruta analysis engine in your
analysis engine as a field or something and call its process() in your
process() method (and initialize()). Then, the worlists should not be
reloaded for each process().

As for SharedResourceObject: This should be done, but it was never at
the top of my todo list. I hope I will find the time sometime.

You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
table using external resources. Could also work for you maybe. Maybe
Silvestre can share his experiences?

Best,

Peter

>
> Regards,
> Armin


Mime
View raw message