uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Juric ...@unsilo.ai>
Subject Re: Question about covering annotations in Ruta match semantics
Date Wed, 09 Oct 2019 20:19:15 GMT
Hi Peter,

Thanks a lot for the answer.

I am still trying to wrap my head around this, and I understand the issues at play when dealing
with a generic rule engine, since I am looking at an isolated case only. I was just thinking
that in my particular case the covering annotation starts before matching 'Dog Cat’, so
why would its ending right before Cat prevent the rule from firing? It doesn’t follow Dog,
and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be matched either,
but I understand now that it is enough that something else being present in this area between
the two rule elements is enough for the match to fail. However, as you describe, the presence
of SPACE annotations and a rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching
despite the presence of the covering annotation.

Have you ever described the implementation of the matching in some paper or similar? I would
be interested to have a look at it, but maybe it’s better just to have a go at the code?
I would certainly prefer reading a high level abstract specification first though :)

Generally I cannot just trim the annotations in the real application, since some of these
whitespaces are included in the marking for various reasons. I therefore played around with
type filtering, since I was hoping that the type filter would allow me to match the rules
while ignoring any presence of filtered types. I was again surprised to find out that filtering
the Covering type while retaining Cat and Dog would in this case just prevent anything from
being matched, because it seems to make all those text parts invisible where the filtered
types appear, no matter if they cover any retained annotation types. So this didn’t seem
to solve my problem either, although I could of course try to mark those areas I otherwise
would consider trimming and include those in the rules like a space or filter on them, which
I guess is what you suggested. It suddenly just becomes somewhat awkward though, and it may
just be more clear to use RutaBasic with the rules instead.


Cheers,
Mario













> On 9 Oct 2019, at 09:35 , Peter Klügl <peter.kluegl@averbis.com> wrote:
> 
> Hi Mario,
> 
> 
> I need to take a closer look as this is not the usual scenario :-)
> 
> 
> However, without testing, I would assume that the second rule does not
> match because the space between dog and cat is not "empty".
> 
> 
> Normally, you have a complete partitioning provided by the seeding which
> causes the RutaBasic annotations. If there are only a few annotations,
> then there needs to be a decision if a text position is visible or not
> (as you have no SPACE, BREAK and MARKUP annotation). You would expect
> that the space between the annotations is ignored, but there is actually
> no reason why Ruta should do that, as there is no information at all
> that it should be ignored (... generic system, you might want to write
> rules for whitespaces...). In order to avoid this problem in such
> situations there is the option to define empty RutaBasics as invisible.
> That are text position where no annotation begins or ends (and not
> covered by annotations) AFAIR and sequential matching could not match at
> all anyway. Thus, the first space is ignored, but the not the second,
> because the Covering annotation ends there.
> 
> 
> Does that make sense?
> 
> 
> I think there are many option how your rules can become more robust, but
> that depends on your complete system/pipeline. Is it an option to trim
> annotations in order to avoid whitespaces at the beginning or ending? Is
> it easy to identify these positions? You could create an annotation
> there and filter it the type.
> 
> 
> 
> Best,
> 
> 
> Peter
> 
> 
> 
> Am 07.10.2019 um 10:21 schrieb Mario Juric:
>> Hi Peter,
>> 
>> I have a script that is executed without any seeders for performance reasons, and
we don’t need the seeded annotations in that case. I have an issue involving annotation
elements that partially cover the rule elements of interest, and I do not have a simple solution
for it, so I have a question about the match semantics. Let me explain it using a simple example
and the text ‘cat dog cat’.
>> 
>> Assume the following 4 annotation types and 2 rule statements:
>> 
>> DECLARE Covering;
>> DECLARE Cat;
>> DECLARE Dog;
>> DECLARE CHASE;
>> Cat Dog { -> MARK(CHASE)};
>> Dog Cat { -> MARK(CHASE)};
>> Assume prior to script execution the following annotations with beginnings and endings:
>> 
>> Cat[0,3[
>> Dog[4,7[
>> Cat[8,11[
>> Covering[0,8[
>> 
>> The Covering annotation is an example of the disturbing element that I observed,
which has nothing or little to do with what I am trying to match. It just happens to be there
for a reason unrelated to these rules, but it causes the second rule not to match when I expected
it. Only the first rule fires, but the second will also fire when I change Covering bounds
to [0,7[ though.
>> 
>> The order in which elements are matched seems very different from how they are usually
selected from the CAS index, where you would get 'Covering Cat Dog Cat’, and with this order
you would intuitvely expect both rules to match. This would probably be overly simplified
though, since I would not be able to match adjacent covering annotations this way, so I believe
matching is somehow based on edge detection. Sill, I have difficulties to understand why that
extra covering space makes a difference.
>> 
>> I was hoping you could provide me with some details, and I also like to know what
possible workaround options I have. I was considering playing around with type filtering,
but it would require a bit of adding/removing types to be filtered during the script, so it
didn’t seem as the simplest solution. Ensuring that covering always aligns with the end
of a token is another possibility in this particular case, but I still need to add general
robustness to the Ruta script against these scenarios. Any feedback is mostly appreciated,
thanks :)
>> 
>> Cheers,
>> Mario
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> -- 
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web: https://averbis.com
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message