uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Juric ...@unsilo.ai>
Subject Question about covering annotations in Ruta match semantics
Date Mon, 07 Oct 2019 08:21:32 GMT
Hi Peter,

I have a script that is executed without any seeders for performance reasons, and we don’t
need the seeded annotations in that case. I have an issue involving annotation elements that
partially cover the rule elements of interest, and I do not have a simple solution for it,
so I have a question about the match semantics. Let me explain it using a simple example and
the text ‘cat dog cat’.

Assume the following 4 annotation types and 2 rule statements:

DECLARE Covering;
DECLARE Cat;
DECLARE Dog;
DECLARE CHASE;
Cat Dog { -> MARK(CHASE)};
Dog Cat { -> MARK(CHASE)};
Assume prior to script execution the following annotations with beginnings and endings:

Cat[0,3[
Dog[4,7[
Cat[8,11[
Covering[0,8[

The Covering annotation is an example of the disturbing element that I observed, which has
nothing or little to do with what I am trying to match. It just happens to be there for a
reason unrelated to these rules, but it causes the second rule not to match when I expected
it. Only the first rule fires, but the second will also fire when I change Covering bounds
to [0,7[ though.

The order in which elements are matched seems very different from how they are usually selected
from the CAS index, where you would get 'Covering Cat Dog Cat’, and with this order you
would intuitvely expect both rules to match. This would probably be overly simplified though,
since I would not be able to match adjacent covering annotations this way, so I believe matching
is somehow based on edge detection. Sill, I have difficulties to understand why that extra
covering space makes a difference.

I was hoping you could provide me with some details, and I also like to know what possible
workaround options I have. I was considering playing around with type filtering, but it would
require a bit of adding/removing types to be filtered during the script, so it didn’t seem
as the simplest solution. Ensuring that covering always aligns with the end of a token is
another possibility in this particular case, but I still need to add general robustness to
the Ruta script against these scenarios. Any feedback is mostly appreciated, thanks :)

Cheers,
Mario










Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message