From user-return-8180-apmail-uima-user-archive=uima.apache.org@uima.apache.org Thu Aug 29 09:59:46 2019 Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id 7F72219D5B for ; Thu, 29 Aug 2019 09:59:46 +0000 (UTC) Received: (qmail 98990 invoked by uid 500); 29 Aug 2019 09:59:45 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 98968 invoked by uid 500); 29 Aug 2019 09:59:45 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 98956 invoked by uid 99); 29 Aug 2019 09:59:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2019 09:59:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DC204C1BB4 for ; Thu, 29 Aug 2019 09:59:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.8 X-Spam-Level: * X-Spam-Status: No, score=1.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id j4NeuBlGXbLB for ; Thu, 29 Aug 2019 09:59:42 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.217.45; helo=mail-vs1-f45.google.com; envelope-from=talpus@gmail.com; receiver= Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 554D0BC7E2 for ; Thu, 29 Aug 2019 09:59:42 +0000 (UTC) Received: by mail-vs1-f45.google.com with SMTP id l63so1986572vsl.10 for ; Thu, 29 Aug 2019 02:59:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=QiOxs8bcMXqAbWVpqqllcYGl/uCzrCh/FCJBKPk1E8M=; b=m93Gyn/CaQY9VgzitpmDyYbybH3Gd6YYDkLD69qfrtSGmZvyg3RBs4972mJZCJhAjA 6N0h+YiwO1jXRL6nkvwKrpvQZZ1DbqriGUWrV5Xynk1QDbItRy3AHWTc9ql1WOSBr1uY bCG5qW7aTwiarp1OgybPdOAPxrIxnIjsd5ljUNmEBRvXW5MJzmyYcJ0/x+Cxcc2T/Osz Gp64f3MvcOLD+xThtJva/rGpSzREUBHQBsyDK1HPhn46W/pdmlIw4mmfQrxY891CkgXN Us3dtP6aFHVjnvFkO91XTlhLYmfyDhqt7neXeQI0S/LhOpv1PizlB6UHAAWr9zpSZzae 5IeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=QiOxs8bcMXqAbWVpqqllcYGl/uCzrCh/FCJBKPk1E8M=; b=ElDDOcY1eg6cr79y23tXRzK1pbUjUA0yw7z+jvi2betOt2knm8DkFa68SqFgq2GbBk x/9Sl7msrKo0P/sZKPSz8qWp/t/9bZQzClRY04uf/wQrWYMuYXEDJBneA7nzDwhc1TQX iaoHavXiQPhKftizQYFSOp8kt/zJ1EzTDMKuXmGhqkEACqT5IMDShuuiDrDzTwUmBV+B S/EhPK3eUchXRddmFGR0xiNlA3xzR3aZaW/7piHNMwn7EKfg0OtKIoAXzbLxEv5gb/6e ymGQ95d+KazQZTeMvHphiOA3of3sjleYu3CkviLFk4Vtj2faYP+hPtuhIxlmrH3Z370L GprA== X-Gm-Message-State: APjAAAVatc5KTP+mOcgzsxj1H9IoRMIWBKPC58aGMfBfaGXzt+DteYoG dFTdgBKJW2fc6o1kBZ5OfHZ9iNrVDRpCN/h6DMgagJo0k0M= X-Google-Smtp-Source: APXvYqyGg+XdflZTHrGZANbpz6b0pmTSQxQouuq0oh55aKPgfDKb9++50p9MbaJrwBrxAu6+SrdEmgUtYHs8tZQosEE= X-Received: by 2002:a67:c112:: with SMTP id d18mr5020163vsj.42.1567072781762; Thu, 29 Aug 2019 02:59:41 -0700 (PDT) MIME-Version: 1.0 References: <14D2ABCA-1316-4795-A3E3-2F5648B47A48@drooms.com> In-Reply-To: From: Nikolai Krot Date: Thu, 29 Aug 2019 11:59:15 +0200 Message-ID: Subject: Re: Usage of anchors To: user@uima.apache.org Content-Type: multipart/alternative; boundary="0000000000007e83f005913e905e" --0000000000007e83f005913e905e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Peter, I have a question about this comment of yours: < ... but the matching using literal string expression is still really inefficient. What do you mean by "inefficient"? Do you mean it is slow? Say, if I want to use a literal in one hundred rules, what is a better strategy: 1) writing the string literally in every of these 100 rules; or 2) annotating the string (using MARKTABLE) and they using the annotation in these 100 rules? Best regards, Nikolai On Mon, Aug 26, 2019 at 2:27 PM Peter Kl=C3=BCgl wrote: > Hi, > > > Am 21.08.2019 um 15:47 schrieb Dominik Terweh: > > Hi Peter, > > > > Thanks a lot for the clarification. I was wondering about (10) too. > > > > Following your explanation I was wondering, Does it make sense to ancho= r > sequences, such as in (8) and is it "legal" to use multiple anchors in > hierarchical fashion? > > Like A @(B @C D)? > > Yes, it is "legal", but you have to be careful. (There are not enough > unit tests for those rules) > > > > > > Also, is there a difference between the processing of sequences of > annotations or literals (given "A" is annotated as A and so on)? > > A @(B C D) > > Vs > > "A" @("B" "C" "D") > > Vs > > A @("B" C "D") > > > It should not make a difference for the result, but the matching using > literal string epxression is still really inefficient. > > > Best, > > > Peter > > > > > > Best > > Dominik > > > > > > > > Dominik Terweh > > Praktikant > > > > DROOMS > > > > > > Drooms GmbH > > Eschersheimer Landstra=C3=9Fe 6 > > 60322 Frankfurt, Germany > > www.drooms.com > > > > Phone: > > Fax: > > Mail: d.terweh@drooms.com > > > > > > Subscribe to the Drooms newsletter > >>>> > https://drooms.com/en/newsletter?utm_source=3Dnewslettersignup&utm_medium= =3Demailsignature > > Drooms GmbH; Sitz der Gesellschaft / Registered Office: Eschersheimer > Landstr. 6, D-60322 Frankfurt am Main; Geschaeftsfuehrung / Management > Board: Alexandre Grellier; > > Registergericht / Court of Registration: Amtsgericht Frankfurt am Main, > HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am Main, USt-IdNr.= : > DE 224007190 > > > > =EF=BB=BFOn 21.08.19, 12:10, "Peter Kl=C3=BCgl" wrote: > > > > Hi, > > > > Am 20.08.2019 um 16:09 schrieb Dominik Terweh: > > > > > > Dear All, > > > > > > > > > > > > I have some questions regarding processing times and anchors ("@"= ). > > > > > > > > > > > > First of all, is it possible to define an anchor on a disjunction= ? > > > > > > What I tested was to have a simple rule (1) that should start on > the > > > Element in the middle (2). Now this element had a variation (3) > but I > > > could not use the anchor in that case anymore: > > > > > > 1) A B C; // works > > > > > > 2) A @B C; // works > > > > > > 3) A @(B|D) C; // NOT WORKING > > > > > > Is this behaviour intended or simply not supported? > > > > > > [NOTE: NOT WORKING means eclipse does not complain, but the rule > never > > > matches] > > > > > > > > > > > > The above led to some testing with a different setup(4), however, > > > since disjunctions don't seem to work, this was also not valid. > > > > > > 4) A @((B C) | (D C)); // NOT WORKING > > > > > > > Anchors at disjunct rule elements are syntactically supported but d= o > not > > work correctly. I will open a bug ticket. > > > > > > > > > > > > > Is there a scenario where anchors are valid in and before bracket= s? > > > From my observation I've seen that (5)-(10) are all working as > > > expected and all start matching on B. But, do they differ in term= s > of > > > processing? I noticed slightly longer processing times in (5) and > ever > > > so slightly in (6), but not very indicative. Could (5)-(10) diffe= r > in > > > processing time? > > > > > > 5) A @B C > > > > > > 6) (A @B C) > > > > > > 7) @(A @B C) > > > > > > 8) A @(B C) > > > > > > 9) A @(@B C) > > > > > > 10) A (@B C) > > > > > > > Yes since different combinations of methods are called, but I think > > there should not be a big difference between (5)-(9). > > > > > > > > > > > > > Since rule (10) works as expected, why does (11) work differently > and > > > start on A but not on B and D? (This would be useful in a scenari= o > > > where B and D combined appear less often than A) > > > > > > 11) A ((@B C) | (@D C)); // starts matching on A > > > > > > > > > > > > > > > > > > > I have to check that. I think (10) start with A too. > > > > > > > > Two comments for anchors and disjunct rule elements: > > > > Anchors started as a manual option to optimize the rule execution > time > > compared tot he automatic dynamic anchoring. However, the anchor ca= n > > considerably change the consequences of a rule. For me, the anchor = is > > more of an engineering option which also can be used to speed up th= e > rules. > > > > > > Disjunct rule elements are not well supported and maintained in Rut= a. > > Their implementation is not efficient and they can lead to unintene= d > > matches. Thus, their usage is not allowed in my team and I would no= t > > recommend using them right now. > > > > > > (I will try to find the time to improve the implementation) > > > > > > Best, > > > > > > Peter > > > > > > > Thank you in advance for your answers, > > > > > > Best > > > > > > Dominik > > > > > > Dominik Terweh > > > Praktikant > > > > > > *Drooms GmbH* > > > Eschersheimer Landstra=C3=9Fe 6 > > > 60322 Frankfurt, Germany > > > www.drooms.com > > > > > > Phone: > > > Mail: d.terweh@drooms.com > > > > > > < > https://drooms.com/en/newsletter?utm_source=3Dnewslettersignup&utm_medium= =3Demailsignature > > > > > > > > *Drooms GmbH*; Sitz der Gesellschaft / Registered Office: > > > Eschersheimer Landstr. 6, D-60322 Frankfurt am Main; > Gesch=C3=A4ftsf=C3=BChrung > > > / Management Board: Alexandre Grellier; > > > Registergericht / Court of Registration: Amtsgericht Frankfurt am > > > Main, HRB 76454; Finanzamt / Tax Office: Finanzamt Frankfurt am > Main, > > > USt-IdNr.: DE 224007190 > > > > > -- > > Dr. Peter Kl=C3=BCgl > > R&D Text Mining/Machine Learning > > > > Averbis GmbH > > Salzstr. 15 > > 79098 Freiburg > > Germany > > > > Fon: +49 761 708 394 0 > > Fax: +49 761 708 394 10 > > Email: peter.kluegl@averbis.com > > Web: https://averbis.com > > > > Headquarters: Freiburg im Breisgau > > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 > > Managing Directors: Dr. med. Philipp Daumke, Dr. Korn=C3=A9l Mark= =C3=B3 > > > > > > > -- > Dr. Peter Kl=C3=BCgl > R&D Text Mining/Machine Learning > > Averbis GmbH > Salzstr. 15 > 79098 Freiburg > Germany > > Fon: +49 761 708 394 0 > Fax: +49 761 708 394 10 > Email: peter.kluegl@averbis.com > Web: https://averbis.com > > Headquarters: Freiburg im Breisgau > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 > Managing Directors: Dr. med. Philipp Daumke, Dr. Korn=C3=A9l Mark=C3=B3 > > --0000000000007e83f005913e905e--