From user-return-8114-archive-asf-public=cust-asf.ponee.io@uima.apache.org Tue May 7 11:59:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4A4D3180778 for ; Tue, 7 May 2019 13:59:02 +0200 (CEST) Received: (qmail 92701 invoked by uid 500); 7 May 2019 11:59:01 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 92503 invoked by uid 99); 7 May 2019 11:59:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2019 11:59:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B879BC70A9 for ; Tue, 7 May 2019 11:59:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.8 X-Spam-Level: * X-Spam-Status: No, score=1.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 9nO2-yWFh1dB for ; Tue, 7 May 2019 11:58:58 +0000 (UTC) Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 07997611F8 for ; Tue, 7 May 2019 11:58:58 +0000 (UTC) Received: by mail-oi1-f172.google.com with SMTP id u199so4168180oie.5 for ; Tue, 07 May 2019 04:58:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=qbHXzFjFlDxvpe0MxlvLkpmZ5PDShAZb40rneZoCYWA=; b=Y6cLS3nXRELjHxknJTLxSm8/+rvwSTWa36WIY/fAwxjAJrknLUCZtXYAVUX0JJJxOf qfNWnK1C/cRw2qmnRHsySqCFfXs1CG25sa7/WswXBIaLm0UUhbcCsTJStF4+KbBvBga4 d/pIx+9Y6WQehrHXPgS+v0Vf3vQTw45BIqCpgIrs6maXDHsMgzYLhNK+1dtsvwuRa/6D mnfozNdnDohf1vY0Uldhpn872Bp4s9WcYNLuDhet8+wUSBJ8gtlCFGPK1wNGbnTIhpC3 WHvZXPq0BskjC11Dc3tHjNBxEPIOy3sx1VXfUn6w/XUzAXPN2yJGSOX0FW1wBg8sxkh5 9VLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=qbHXzFjFlDxvpe0MxlvLkpmZ5PDShAZb40rneZoCYWA=; b=M1627hWr5cnInk3869T9QTgzD2CpQcihjAA5BqvlYq839T54aDg4MkRzxkre+Da0yb y+m6+uVU6EzBMlzVZkY/h5TogwkK+p5c9h4cCoxMPqHTyEoLpxJeSbn+c/mhPCiipESF yghL68zOgbchy8egBWms7HYakGRBPD6as3yx4q5hsRjD16QKWf+0thD6QUVxyIaUa80n HEOt5t/QoCWXWc7azj8E+MD6dcS32pZzok6ZYj1FmCG3NMtmdMLNPzLAMabJS9+lmZ/h V9QlKpI22249SeHyKi9oe2k7UALW2vih/LvVQLiu4Jqqy0cp5YN6ExyhVZByt5FyPRyq 23tw== X-Gm-Message-State: APjAAAUabbxammJJObkqQo4LhmnB18JaETXgOU4zY6x0ajv/joE/dywZ iwsT1jqMwOXRdrRdBBYwC5S2GvsdOc6BNMSx7XyJm6Mb X-Google-Smtp-Source: APXvYqzRYnUcxUlMxcK+cSuIgeHGqAjDeNV4QmnerpIau3tc+ZFB9RCOk40W7FFkLkQcC/9n0cpvKpbZ7clu6fphE/8= X-Received: by 2002:aca:cc90:: with SMTP id c138mr2077228oig.97.1557230331121; Tue, 07 May 2019 04:58:51 -0700 (PDT) MIME-Version: 1.0 References: <9cf341b8-adef-f058-50f4-a1d1a4cb95e2@averbis.com> In-Reply-To: From: Nikolai Krot Date: Tue, 7 May 2019 13:58:27 +0200 Message-ID: Subject: Re: fuzzy matching possible? To: user@uima.apache.org Content-Type: multipart/alternative; boundary="000000000000b8566005884af0f1" --000000000000b8566005884af0f1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Peter, Thank you for your reply > at the end you need to check both, but you could maybe refactor the > checks in a new condition like (not tested): > > CONDITION LemmaCT(ANNOTATION word, STRING check) =3D OR(word.lemma =3D=3D > check, word.ct =3D=3D check); > > w: Word{LemmaCT(w, "gearbeitet")}; > > this looks interesting and indeed shorter. Somehow I missed the section about Macros in RUTA manual. Thanks again and best regards, Nikolai > Best, > > > Peter > > > > Am 04.05.2019 um 13:44 schrieb Nikolai Krot: > > Hi Peter, > > > > Thank you for the answer. > > > >> that mainly depends on the typesystem. Your rule could look something > > like: > >> > >> w:Word{OR(w.lemma =3D=3D "arbeiten", w.ct =3D=3D "gearbeitet")}; > > I know of this syntax. My question is whether there is a shorter form t= o > > tell than whenever I need to match word text, the matching should check > > both lemma and ct fields. Think of a few dozen rules like this... > > > > Best regards, > > Nikolai > >> > >> Best, > >> > >> > >> Peter > >> > >> Am 03.05.2019 um 18:28 schrieb Nikolai Krot: > >>> Hi Peter, > >>> > >>> Thank you for your prompt reply. > >>> > >>> Speaking about pre-annotation with another engine. Say, I managed to > >>> annotate words of interest and additionally set an attribute, somethi= ng > >>> like this > >>> > >>> ... gearbeitet... > >>> > >>> Is there a simple way configure the object of matching in ruta rules = so > >>> that the rule matches over actual text ("gearbeitet" in our case) or > the > >>> value of attribute "lemma" ("arbeiten" in our case)? > >>> That is, match should return True if either of the fields evaluates t= o > > True. > >>> This would make some rules simpler. > >>> > >>> Best regards, > >>> Nikolai > >>> > >>> On Fri, May 3, 2019 at 2:03 PM Peter Kl=C3=BCgl > > wrote: > >>>> Hi, > >>>> > >>>> > >>>> there is/was support for a weighted edit distance in the trie lookup= , > >>>> but that functionality was not maintained for many years. > >>>> > >>>> The dictionary lookup functionality in Ruta is overall very limited. > >>>> Normally, one uses an separate analysis engine with extended logic > >>>> (ConceptMapper?) for creating the annotations, which are then later > >>>> reused in rules. > >>>> > >>>> > >>>> Best, > >>>> > >>>> > >>>> Peter > >>>> > >>>> Am 03.05.2019 um 13:16 schrieb Nikolai Krot: > >>>>> Hi all, > >>>>> > >>>>> Is there a possibility to match a word somehow fuzzily in UIMA Ruta > >>>>> language? I am thinking how to overcome problems with typos and OCR > >>>>> mistakes... It is hardly possible to list all possibilities how a > word > >>>>> could have been broken. > >>>>> > >>>>> Best regards, > >>>>> Nikolai Krot > >>>>> > >>>> -- > >>>> Dr. Peter Kl=C3=BCgl > >>>> R&D Text Mining/Machine Learning > >>>> > >>>> Averbis GmbH > >>>> Salzstr. 15 > >>>> 79098 Freiburg > >>>> Germany > >>>> > >>>> Fon: +49 761 708 394 0 > >>>> Fax: +49 761 708 394 10 > >>>> Email: peter.kluegl@averbis.com > >>>> Web: https://averbis.com > >>>> > >>>> Headquarters: Freiburg im Breisgau > >>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 > >>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Korn=C3=A9l Mark=C3= =B3 > >>>> > >>>> > >> -- > >> Peter Kl=C3=BCgl > >> R&D Text Mining/Machine Learning > >> > >> Averbis GmbH > >> Salzstr. 15 > >> 79098 Freiburg > >> Germany > >> > >> Fon: +49 761 708 394 0 > >> Fax: +49 761 708 394 10 > >> Email: peter.kluegl@averbis.com > >> Web: https://averbis.com > >> > >> Headquarters: Freiburg im Breisgau > >> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 > >> Managing Directors: Dr. med. Philipp Daumke, Dr. Korn=C3=A9l Mark=C3= =B3 > >> > -- > Dr. Peter Kl=C3=BCgl > R&D Text Mining/Machine Learning > > Averbis GmbH > Salzstr. 15 > 79098 Freiburg > Germany > > Fon: +49 761 708 394 0 > Fax: +49 761 708 394 10 > Email: peter.kluegl@averbis.com > Web: https://averbis.com > > Headquarters: Freiburg im Breisgau > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 > Managing Directors: Dr. med. Philipp Daumke, Dr. Korn=C3=A9l Mark=C3=B3 > > --000000000000b8566005884af0f1--