uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Armin.Weg...@bka.bund.de>
Subject AW: RUTA and shared resources
Date Tue, 27 Jan 2015 12:40:08 GMT

Looks good, but is not part of the current release. It's not that urgent to deviate from the
current stable release. Any ideas when 2.3.0 will be released.


-----Ursprüngliche Nachricht-----
Von: Silvestre Losada [mailto:silvestre.losada@gmail.com] 
Gesendet: Sonntag, 25. Januar 2015 08:42
An: user@uima.apache.org
Betreff: Re: RUTA and shared resources

Hi Armin,

Apologies for late response. I was able to load a datatable as external
resource, I think that the example showed in comment is self-explanatory.
If you have any issues loading it, please contact me.

Kind regards.

On 23 January 2015 at 08:59, <Armin.Wegner@bka.bund.de> wrote:

> Hi Peter!
> Thanks for your help. I will look at it.
> At least for now, greedy anchoring and markfast work as expected. But I've
> used only short word lists with simple entries.
> Cheers,
> Armin
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
> Gesendet: Donnerstag, 22. Januar 2015 11:24
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
> Hi,
> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> > Hello!
> >
> > This a very short and simple gazetteer using RUTA.
> >
> > Document{->GREEDYANCHORING(true)};
> > %s*{->MARKFAST(%s,'%s')};
> First of all, I am sorry that I was not yet able to implement the greedy
> matching for the gazetteers/wordlists. I have not forgotten it.
> Just curious: does the rule perform as you expect/intend? I mean the
> combination of greedy anchoring and the windowed stream caused by the
> matching condition.
> >
> > where the first %s is replaced using String.format() by the name of
> the source type, the second %s is replaced by the target type name, and
> the third %s is replaced by the URL of a word list. Doing so, it's a
> little bit for flexible. This is done once in
> CasAnnotator_ImplBase.initialize().
> >
> > Then the script is executed with Ruta.apply(cas, script) in process().
> But that means that the word list is read again for every CAS processed.
> Is there any way to have RUTA use the word list as a
> SharedResourceObject, so that it is read once only?
> The problem is that Ruta.apply() creates a new descriptor and a new
> analysis engine. You could integrate the ruta analysis engine in your
> analysis engine as a field or something and call its process() in your
> process() method (and initialize()). Then, the worlists should not be
> reloaded for each process().
> As for SharedResourceObject: This should be done, but it was never at
> the top of my todo list. I hope I will find the time sometime.
> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
> table using external resources. Could also work for you maybe. Maybe
> Silvestre can share his experiences?
> Best,
> Peter
> >
> > Regards,
> > Armin
View raw message