uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvestre Losada <silvestre.los...@gmail.com>
Subject Re: RUTA and shared resources
Date Sun, 25 Jan 2015 07:42:25 GMT
Hi Armin,

Apologies for late response. I was able to load a datatable as external
resource, I think that the example showed in comment is self-explanatory.
If you have any issues loading it, please contact me.

Kind regards.

On 23 January 2015 at 08:59, <Armin.Wegner@bka.bund.de> wrote:

> Hi Peter!
>
> Thanks for your help. I will look at it.
> At least for now, greedy anchoring and markfast work as expected. But I've
> used only short word lists with simple entries.
>
> Cheers,
> Armin
>
>
>
>
>
> -----Urspr├╝ngliche Nachricht-----
> Von: Peter Kl├╝gl [mailto:pkluegl@uni-wuerzburg.de]
> Gesendet: Donnerstag, 22. Januar 2015 11:24
> An: user@uima.apache.org
> Betreff: Re: RUTA and shared resources
>
> Hi,
>
> Am 22.01.2015 um 09:20 schrieb Armin.Wegner@bka.bund.de:
> > Hello!
> >
> > This a very short and simple gazetteer using RUTA.
> >
> > Document{->GREEDYANCHORING(true)};
> > %s*{->MARKFAST(%s,'%s')};
>
> First of all, I am sorry that I was not yet able to implement the greedy
> matching for the gazetteers/wordlists. I have not forgotten it.
> Just curious: does the rule perform as you expect/intend? I mean the
> combination of greedy anchoring and the windowed stream caused by the
> matching condition.
>
>
> >
> > where the first %s is replaced using String.format() by the name of
> the source type, the second %s is replaced by the target type name, and
> the third %s is replaced by the URL of a word list. Doing so, it's a
> little bit for flexible. This is done once in
> CasAnnotator_ImplBase.initialize().
> >
> > Then the script is executed with Ruta.apply(cas, script) in process().
> But that means that the word list is read again for every CAS processed.
> Is there any way to have RUTA use the word list as a
> SharedResourceObject, so that it is read once only?
>
> The problem is that Ruta.apply() creates a new descriptor and a new
> analysis engine. You could integrate the ruta analysis engine in your
> analysis engine as a field or something and call its process() in your
> process() method (and initialize()). Then, the worlists should not be
> reloaded for each process().
>
> As for SharedResourceObject: This should be done, but it was never at
> the top of my todo list. I hope I will find the time sometime.
>
> You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
> Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
> table using external resources. Could also work for you maybe. Maybe
> Silvestre can share his experiences?
>
> Best,
>
> Peter
>
> >
> > Regards,
> > Armin
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message