uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject question on REGEXP in Ruta
Date Sun, 07 Feb 2016 23:37:17 GMT
Hi,

I am trying to write RUTA rules using regular expressions and capturing
groups. I want the matches to be line by line. I can do this using the
following script

ENGINE utils.PlainTextAnnotator;
TYPESYSTEM utils.PlainTextTypeSystem;
Document{-> RETAINTYPE(BREAK)};
Document{-> EXEC(PlainTextAnnotator)};
DECLARE Rule1NoPattern, Group1, Group2;
Line{REGEXP(".*no|No (.*)") -> Rule1NoPattern};

Given this text
Not pregnant or nursing
Fertile patients must use effective contraception (hormonal contraception
or intra-uterine device [IUD])
No concurrent participation in another clinical trial that would preclude
the interventions or outcome assessment of this clinical trial
No other concurrent anticancer therapy

it correctly matches the last two lines and annotates them with
Rule1NoPattern
The problem is, I want to use the capturing group information as well. I
can do this using the simple regular expression syntax
".*no|No (.*)\n|S" -> Rule1NoPattern, 1=Group1;

if I just give it one line, say
No other concurrent anticancer therapy

it will correctly annotate the entire line with Rule1NoPattern, and "other
concurrent anticancer therapy" wll be annotated with Group1.
Is there a way, using the first rule variant
Line{REGEXP(".*no|No (.*)") -> Rule1NoPattern};

to annotate the text in capturing group?

I have tried all kinds of syntax, but none of it seems to be correct

thanks,
Bonnie MacKellar

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message