uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <pklu...@uni-wuerzburg.de>
Subject Re: RUTA and shared resources
Date Tue, 27 Jan 2015 13:12:30 GMT

we are currently waiting for the uimaj release, and there are still some 
open issues. End of February could be realistic.



Am 27.01.2015 um 13:40 schrieb Armin.Wegner@bka.bund.de:
> Hi!
> Looks good, but is not part of the current release. It's not that urgent to deviate from
the current stable release. Any ideas when 2.3.0 will be released.
> Thanks,
> Armin
> -----Ursprüngliche Nachricht-----
> Von: Silvestre Losada [mailto:silvestre.losada@gmail.com]
> Gesendet: Sonntag, 25. Januar 2015 08:42
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
> Hi Armin,
> Apologies for late response. I was able to load a datatable as external
> resource, I think that the example showed in comment is self-explanatory.
> If you have any issues loading it, please contact me.
> Kind regards.
> On 23 January 2015 at 08:59, <Armin.Wegner@bka.bund.de> wrote:
>> Hi Peter!
>> Thanks for your help. I will look at it.
>> At least for now, greedy anchoring and markfast work as expected. But I've
>> used only short word lists with simple entries.
>> Cheers,
>> Armin
>> -----Ursprüngliche Nachricht-----
>> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
>> Gesendet: Donnerstag, 22. Januar 2015 11:24
>> An: user@uima.apache.org
>> Betreff: Re: RUTA and shared resources
>> Hi,
>> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
>>> Hello!
>>> This a very short and simple gazetteer using RUTA.
>>> Document{->GREEDYANCHORING(true)};
>>> %s*{->MARKFAST(%s,'%s')};
>> First of all, I am sorry that I was not yet able to implement the greedy
>> matching for the gazetteers/wordlists. I have not forgotten it.
>> Just curious: does the rule perform as you expect/intend? I mean the
>> combination of greedy anchoring and the windowed stream caused by the
>> matching condition.
>>> where the first %s is replaced using String.format() by the name of
>> the source type, the second %s is replaced by the target type name, and
>> the third %s is replaced by the URL of a word list. Doing so, it's a
>> little bit for flexible. This is done once in
>> CasAnnotator_ImplBase.initialize().
>>> Then the script is executed with Ruta.apply(cas, script) in process().
>> But that means that the word list is read again for every CAS processed.
>> Is there any way to have RUTA use the word list as a
>> SharedResourceObject, so that it is read once only?
>> The problem is that Ruta.apply() creates a new descriptor and a new
>> analysis engine. You could integrate the ruta analysis engine in your
>> analysis engine as a field or something and call its process() in your
>> process() method (and initialize()). Then, the worlists should not be
>> reloaded for each process().
>> As for SharedResourceObject: This should be done, but it was never at
>> the top of my todo list. I hope I will find the time sometime.
>> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
>> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
>> table using external resources. Could also work for you maybe. Maybe
>> Silvestre can share his experiences?
>> Best,
>> Peter
>>> Regards,
>>> Armin

View raw message