uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Armin.Weg...@bka.bund.de>
Subject AW: RUTA and shared resources
Date Tue, 27 Jan 2015 12:40:08 GMT
Hi!

Looks good, but is not part of the current release. It's not that urgent to deviate from the
current stable release. Any ideas when 2.3.0 will be released.

Thanks,
Armin

-----Ursprüngliche Nachricht-----
Von: Silvestre Losada [mailto:silvestre.losada@gmail.com] 
Gesendet: Sonntag, 25. Januar 2015 08:42
An: user@uima.apache.org
Betreff: Re: RUTA and shared resources

Hi Armin,

Apologies for late response. I was able to load a datatable as external
resource, I think that the example showed in comment is self-explanatory.
If you have any issues loading it, please contact me.

Kind regards.

On 23 January 2015 at 08:59, <Armin.Wegner@bka.bund.de> wrote:

> Hi Peter!
>
> Thanks for your help. I will look at it.
> At least for now, greedy anchoring and markfast work as expected. But I've
> used only short word lists with simple entries.
>
> Cheers,
> Armin
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:pkluegl@uni-wuerzburg.de]
> Gesendet: Donnerstag, 22. Januar 2015 11:24
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
>
> Hi,
>
> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> > Hello!
> >
> > This a very short and simple gazetteer using RUTA.
> >
> > Document{->GREEDYANCHORING(true)};
> > %s*{->MARKFAST(%s,'%s')};
>
> First of all, I am sorry that I was not yet able to implement the greedy
> matching for the gazetteers/wordlists. I have not forgotten it.
> Just curious: does the rule perform as you expect/intend? I mean the
> combination of greedy anchoring and the windowed stream caused by the
> matching condition.
>
>
> >
> > where the first %s is replaced using String.format() by the name of
> the source type, the second %s is replaced by the target type name, and
> the third %s is replaced by the URL of a word list. Doing so, it's a
> little bit for flexible. This is done once in
> CasAnnotator_ImplBase.initialize().
> >
> > Then the script is executed with Ruta.apply(cas, script) in process().
> But that means that the word list is read again for every CAS processed.
> Is there any way to have RUTA use the word list as a
> SharedResourceObject, so that it is read once only?
>
> The problem is that Ruta.apply() creates a new descriptor and a new
> analysis engine. You could integrate the ruta analysis engine in your
> analysis engine as a field or something and call its process() in your
> process() method (and initialize()). Then, the worlists should not be
> reloaded for each process().
>
> As for SharedResourceObject: This should be done, but it was never at
> the top of my todo list. I hope I will find the time sometime.
>
> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
> table using external resources. Could also work for you maybe. Maybe
> Silvestre can share his experiences?
>
> Best,
>
> Peter
>
> >
> > Regards,
> > Armin
>
>
>
Mime
View raw message