Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6EF3F189A5 for ; Tue, 20 Oct 2015 17:23:25 +0000 (UTC) Received: (qmail 88982 invoked by uid 500); 20 Oct 2015 17:23:09 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 88936 invoked by uid 500); 20 Oct 2015 17:23:09 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 88923 invoked by uid 99); 20 Oct 2015 17:23:09 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2015 17:23:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 855F51A2DA8 for ; Tue, 20 Oct 2015 17:23:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.099 X-Spam-Level: X-Spam-Status: No, score=-0.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JpJgW3CVn6te for ; Tue, 20 Oct 2015 17:22:53 +0000 (UTC) Received: from mail-lb0-f169.google.com (mail-lb0-f169.google.com [209.85.217.169]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 21D8020FE0 for ; Tue, 20 Oct 2015 17:22:53 +0000 (UTC) Received: by lbbec13 with SMTP id ec13so18197831lbb.0 for ; Tue, 20 Oct 2015 10:22:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=HbVkfkPeLS++J5toQAfGMpW44hwzrsd3gPDKd9Y8g4Q=; b=FY7fZ2H7QL1/FEH4MEEf6Sjp3K8KRk4lzRC6QQ5XWobXHNwC3shr3Dye9UqaHgCxHz 1nvR6vnNJXWxaMoA4l4Tl0YlF2atUMx72pdQ6rkfUlpkBYR4TzI9sS14fr17682C869W tW6/qIE3HbzbWeF6HuYYbNNh4q+XntesAT4x45LNTSht04KTctzKGTiqa1hZfKVDlL0Z qYUpGuvzXpr5ZSPa/z7STTXbxM1NXHE7PXbPZPIjoj0BqbyNqM6PrxuMQjFmz60FkNeC bxeEnsnMeH+yPeV0mfl4WnLL4zC9/zOHUvmKtnLQ37KTbzWdLOieJFzDNMLRITYoYJHH 80bg== X-Received: by 10.112.130.2 with SMTP id oa2mr2660853lbb.14.1445361771544; Tue, 20 Oct 2015 10:22:51 -0700 (PDT) Received: from [192.168.22.20] ([87.104.197.212]) by smtp.gmail.com with ESMTPSA id 200sm680017lfz.48.2015.10.20.10.22.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 20 Oct 2015 10:22:50 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: UIMA Ruta not capturing some XML markup with attributes? From: Mario Gazzo In-Reply-To: <56266434.5040108@averbis.com> Date: Tue, 20 Oct 2015 19:22:47 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <69472753-AB82-4575-9D46-FEE865AB39DB@gmail.com> References: <33DAD900-EAF3-4148-9DE0-06D3378304D5@gmail.com> <56266434.5040108@averbis.com> To: user@uima.apache.org X-Mailer: Apple Mail (2.2104) I believe it should be extended since I think that a RUTA user would = expect that the MARKUP annotation indeed captures at least XML and HTML = markup properly. The examples are from a Pub Med Central XML file that = follows the NISO JATS specification so I will assume it is proper = formatted XML without knowing all the details of the spec. We have managed to implement a crude workaround for now but let us know = when an improved version becomes available. Cheers Mario > On 20 Oct 2015, at 17:56 , Peter Kl=C3=BCgl = wrote: >=20 > Hi Mario, >=20 > yes, and the different quote also causes problems (are these valid?). >=20 > The MARUP annotation is not created by jflex like the other = annoations, > but by a postprocessing step using an regular epression. This = expression > does not cover theses cases (markupPattern in DefaultSeeder.java). >=20 > Should we extend it? >=20 > Best, >=20 > Peter >=20 > Am 20.10.2015 um 17:26 schrieb Mario Gazzo: >> Hi Peter, >>=20 >> RUTA doesn=E2=80=99t seem to capture some XML markup with attributes. = Here are some examples: >>=20 >> >> >>=20 >> The above markup examples are totally missing in the TokenSeed = annotations. I wonder whether it is related to the dash in the attribute = names since other markup without this appear to be captured. >>=20 >> Can you confirm that the dash could cause the problem? >>=20 >> Cheers >> Mario >=20