uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Gazzo <mario.ga...@gmail.com>
Subject Re: Very long Ruta stream initialization
Date Tue, 22 Dec 2015 16:26:08 GMT
I got around it by removing the default seeders by specifying an empty seeders list since we
don’t need the MARKUP annotations anymore.

I still don’t know why it created so much overhead but it sometimes seemed to rival the
POS tagger in processing time.

Anyway, this leads me to the next question. Can I disable the creation of Ruta basic annotations
entirely to save processing overhead and only apply Ruta rules to other annotation types created
by other AEs such as our own?


> On 21 Dec 2015, at 16:09 , Mario Juric <mario.juric.dk@gmail.com> wrote:
> Hi Peter,
> I noticed that occasionally the initialisation in RutaEngine::initializeStream can tak
very long time. I can’t really explain them and it seems independent of document length
since I have seen this with even very small XML documents.
> The method seems to spend much time in the DefaultSeeder when creating MARKUP annotations
during subiterator.moveToNext calls (line 89) and inside Subiterator it seems to be the while
loop inside adjustForStrictForward (line 232), which is inside UIMA core classes. I haven’t
gone into any deeper analysis yet but I first like to hear whether you have an idea what could
be the main cause(s) for this?
> We use Ruta 2.3.1 with UIMA 2.8.1
> Cheers
> Mario

View raw message