uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Question about covering annotations in Ruta match semantics
Date Wed, 09 Oct 2019 07:35:17 GMT
Hi Mario,

I need to take a closer look as this is not the usual scenario :-)

However, without testing, I would assume that the second rule does not
match because the space between dog and cat is not "empty".

Normally, you have a complete partitioning provided by the seeding which
causes the RutaBasic annotations. If there are only a few annotations,
then there needs to be a decision if a text position is visible or not
(as you have no SPACE, BREAK and MARKUP annotation). You would expect
that the space between the annotations is ignored, but there is actually
no reason why Ruta should do that, as there is no information at all
that it should be ignored (... generic system, you might want to write
rules for whitespaces...). In order to avoid this problem in such
situations there is the option to define empty RutaBasics as invisible.
That are text position where no annotation begins or ends (and not
covered by annotations) AFAIR and sequential matching could not match at
all anyway. Thus, the first space is ignored, but the not the second,
because the Covering annotation ends there.

Does that make sense?

I think there are many option how your rules can become more robust, but
that depends on your complete system/pipeline. Is it an option to trim
annotations in order to avoid whitespaces at the beginning or ending? Is
it easy to identify these positions? You could create an annotation
there and filter it the type.



Am 07.10.2019 um 10:21 schrieb Mario Juric:
> Hi Peter,
> I have a script that is executed without any seeders for performance reasons, and we
don’t need the seeded annotations in that case. I have an issue involving annotation elements
that partially cover the rule elements of interest, and I do not have a simple solution for
it, so I have a question about the match semantics. Let me explain it using a simple example
and the text ‘cat dog cat’.
> Assume the following 4 annotation types and 2 rule statements:
> DECLARE Covering;
> Cat Dog { -> MARK(CHASE)};
> Dog Cat { -> MARK(CHASE)};
> Assume prior to script execution the following annotations with beginnings and endings:
> Cat[0,3[
> Dog[4,7[
> Cat[8,11[
> Covering[0,8[
> The Covering annotation is an example of the disturbing element that I observed, which
has nothing or little to do with what I am trying to match. It just happens to be there for
a reason unrelated to these rules, but it causes the second rule not to match when I expected
it. Only the first rule fires, but the second will also fire when I change Covering bounds
to [0,7[ though.
> The order in which elements are matched seems very different from how they are usually
selected from the CAS index, where you would get 'Covering Cat Dog Cat’, and with this order
you would intuitvely expect both rules to match. This would probably be overly simplified
though, since I would not be able to match adjacent covering annotations this way, so I believe
matching is somehow based on edge detection. Sill, I have difficulties to understand why that
extra covering space makes a difference.
> I was hoping you could provide me with some details, and I also like to know what possible
workaround options I have. I was considering playing around with type filtering, but it would
require a bit of adding/removing types to be filtered during the script, so it didn’t seem
as the simplest solution. Ensuring that covering always aligns with the end of a token is
another possibility in this particular case, but I still need to add general robustness to
the Ruta script against these scenarios. Any feedback is mostly appreciated, thanks :)
> Cheers,
> Mario
Dr. Peter Klügl
R&D Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

View raw message