Thanks Peter, The quotes are just normal quotes in the original source but the mail software must have changed this. Sorry about that misunderstanding. Cheers Mario > On 21/10/2015, at 16.03, Peter Klügl wrote: > > Hi, > > I extended the pattern to support dashes, but not the other quotes. This > can get arbitrary complex (and slow) if any combination of unicode > characters that look like quotes should be supported. I still think that > this is not valid xml. Can you give me a link to the standard? > > It's maybe better to solve this in a specific use case before applying > the seeder. > > Best, > > Peter > >> Am 20.10.2015 um 19:22 schrieb Mario Gazzo: >> I believe it should be extended since I think that a RUTA user would expect that the MARKUP annotation indeed captures at least XML and HTML markup properly. The examples are from a Pub Med Central XML file that follows the NISO JATS specification so I will assume it is proper formatted XML without knowing all the details of the spec. >> >> We have managed to implement a crude workaround for now but let us know when an improved version becomes available. >> >> Cheers >> Mario >> >>> On 20 Oct 2015, at 17:56 , Peter Klügl wrote: >>> >>> Hi Mario, >>> >>> yes, and the different quote also causes problems (are these valid?). >>> >>> The MARUP annotation is not created by jflex like the other annoations, >>> but by a postprocessing step using an regular epression. This expression >>> does not cover theses cases (markupPattern in DefaultSeeder.java). >>> >>> Should we extend it? >>> >>> Best, >>> >>> Peter >>> >>>> Am 20.10.2015 um 17:26 schrieb Mario Gazzo: >>>> Hi Peter, >>>> >>>> RUTA doesn’t seem to capture some XML markup with attributes. Here are some examples: >>>> >>>> >>>> >>>> The above markup examples are totally missing in the TokenSeed annotations. I wonder whether it is related to the dash in the attribute names since other markup without this appear to be captured. >>>> >>>> Can you confirm that the dash could cause the problem? >>>> >>>> Cheers >>>> Mario >