uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Gazzo <mario.ga...@gmail.com>
Subject Re: UIMA Ruta not capturing some XML markup with attributes?
Date Tue, 20 Oct 2015 17:22:47 GMT
I believe it should be extended since I think that a RUTA user would expect that the MARKUP
annotation indeed captures at least XML and HTML markup properly. The examples are from a
Pub Med Central XML file that follows the NISO JATS specification so I will assume it is proper
formatted XML without knowing all the details of the spec.

We have managed to implement a crude workaround for now but let us know when an improved version
becomes available.

Cheers
Mario

> On 20 Oct 2015, at 17:56 , Peter Klügl <peter.kluegl@averbis.com> wrote:
> 
> Hi Mario,
> 
> yes, and the different quote also causes problems (are these valid?).
> 
> The MARUP annotation is not created by jflex like the other annoations,
> but by a postprocessing step using an regular epression. This expression
> does not cover theses cases (markupPattern in DefaultSeeder.java).
> 
> Should we extend it?
> 
> Best,
> 
> Peter
> 
> Am 20.10.2015 um 17:26 schrieb Mario Gazzo:
>> Hi Peter,
>> 
>> RUTA doesn’t seem to capture some XML markup with attributes. Here are some examples:
>> 
>> <xref ref-type="bibr" rid="b35-ehp0113-000220”>
>> <sec sec-type="methods”>
>> 
>> The above markup examples are totally missing in the TokenSeed annotations. I wonder
whether it is related to the dash in the attribute names since other markup without this appear
to be captured.
>> 
>> Can you confirm that the dash could cause the problem?
>> 
>> Cheers
>> Mario
> 


Mime
View raw message