uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Weber <andreas.we...@empolis.com>
Subject Problem with matching of composite rules
Date Thu, 09 Jul 2015 14:31:49 GMT
Hi,

I just started with Ruta by trying some kind of fact extraction on my 
own annotations.
Some problems occured with composite rules (having more than one rule 
element) when more than one annotation of the same kind occurs at the 
same position in the input text.
It's hard to provide a reproduceable example cause the whole processing 
is integrated in a bunch of our software, but I try to explain it:

In my input text I use an annotation MY_ANNO which has a feature "STRING 
type".
My composite rule should find occurences of MY_ANNO with the feature 
"type = person" followed by MY_ANNO with the feature "type = location" 
in the same sentence:

MyAnno.type == "person" {
    -> MARK ...    // do some action
}
W*?
MyAnno.type == "location" {
    -> MARK ...   // do some action
};
  
This works fine when my input annotations look (simplified) like that:

1. W
   MY_ANNO (type: person)
2. W
3. W 
   MY_ANNO (type: location)  

(1,2 and 3 are the positions of the annotations, e.g. at the first 
position we have two annotations: "W" and "MY_ANNO (type: person)" )
  
The rule above doesn't match when my input has an additional MY_ANNO 
annotation at the third position:

1. W
   MY_ANNO (type: person)
2. W
3. W 
   MY_ANNO (type: somethingElse)  
   MY_ANNO (type: location)  
  
Even a more simple rule without Star Reluctant operator doesn't match 
for that input: 

MyAnno.type == "person" {
    -> MARK ...    // do some action
}
W
MyAnno.type == "location" {
    -> MARK ...   // do some action
};

Maybe that's a misinterpretation by myself of how the Ruta rule 
evaluation should work.

However, I tried to find the reason by debugging a little bit in the 
Ruta code (ruta-core 2.3.0) and found RutaRuleElement.continueMatch(): 
Here, the idea seems to be when the "useAlternatives" flag is set to 
true, the "ruleMatch" and "containerMatch" objects are copied for having 
them unchanged when processing each alternative. But the original 
"containerMatch" object can be changed by further processing in the 
stepbackMatch() call. 
In my case the matching failed for the first alternative and that set 
the containerMatch object to not matching. And although the second 
alternative matched, the base of the "ruleMatch" was already marked as 
non-matching and so the evaluation of the whole ruleMatch was false. 
Is there a reason for that or is this a bug? 

Any help/hints/comments appreciated!  :)

Best regards, 
  Andreas



Mime
View raw message