uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: to edit seed file in ruta
Date Wed, 17 Feb 2016 21:29:17 GMT
Hi,

there are several way to annotate that without changing the seeder.

Your rule won't work for several reason, e.g., the REGEXP condition 
checks only the covered text of the matching rule element (W), which is 
only one word.

Here are some ways to annotate it (not tested)

Option 1: a normal rule (I think ":" is included in MARKUP for UIMA Ruta 
2.4.0)
RETAINTYPE(MARKUP);
MARKUP{REGEXP("<w:t>")} #{-> Text} MARKUP{REGEXP("</w:t>")};
or
MARKUP.ct=="<w:t>" #{-> Text} MARKUP.ct=="</w:t>";

Option 2: a simple regex rule
"<w:t>(.+?)</w:t>" -> 1 = Text;
http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.language.regexprule

Option 3: use HtmlAnnotator
something like:
ENGINE utils.HtmlAnnotator;
TYPESYSTEM utils.HtmlTypeSystem;
EXEC(HtmlAnnotator, {TAG});
TAG.name=="w:t"{-> Text};

The HtmlAnnotator can be configured to only annotate the content of xml 
elements.
http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.html

Best,

Peter


Am 17.02.2016 um 10:33 schrieb AmyJacksonKatrina:
> Peter Klügl <peter.kluegl@...> writes:
>
>> Hi,
>>
>> did the answer to your last mail help?
>> What changes did you try which had no effect?
>> Can you explain your use case in more detail?
>>
>> There is probably a much easier solution than to modify the seed file.
>>
>> Best,
>>
>> Peter
>>
>> Am 10.02.2016 um 06:34 schrieb AmyJacksonKatrina:
>>> how can i edit seed file in uima ruta. that changes to be effect on
>>> eclipse output. But whatever changes i made the eclipse output is
> asusual.
>>> Thanks in advance.
>>>
>>
>
>
> Thank you Peter. I have been trying to match text
>     <w:t>AnyText</w:t> in a xml file. But the regex pattern
> which i used in a script
>   W{REGEXP("(<w:t>(.+?)</w:t>)")->MARK(Text)};
> is not matching. So i want to know, can a ruta will accept long regex
> pattern or will have to give that in seed.flex file.  Help me with a
> solution to match this text.


Mime
View raw message