From user-return-8110-archive-asf-public=cust-asf.ponee.io@uima.apache.org Sat May 4 09:41:05 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0CE3E180621 for ; Sat, 4 May 2019 11:41:04 +0200 (CEST) Received: (qmail 77576 invoked by uid 500); 4 May 2019 09:41:03 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 77544 invoked by uid 99); 4 May 2019 09:41:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 May 2019 09:41:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0119C182808 for ; Sat, 4 May 2019 09:41:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.001 X-Spam-Level: * X-Spam-Status: No, score=1.001 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id l7orq3ndGOiK for ; Sat, 4 May 2019 09:40:57 +0000 (UTC) Received: from mout.kundenserver.de (mout.kundenserver.de [217.72.192.73]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C92755F5CC for ; Sat, 4 May 2019 09:40:56 +0000 (UTC) Received: from [192.168.0.44] ([95.208.248.29]) by mrelayeu.kundenserver.de (mreue107 [212.227.15.183]) with ESMTPSA (Nemesis) id 1M8hEd-1hQxtr0gUb-004kcu for ; Sat, 04 May 2019 11:40:50 +0200 Subject: Re: fuzzy matching possible? To: user@uima.apache.org References: From: =?UTF-8?Q?Peter_Kl=c3=bcgl?= Message-ID: <9cf341b8-adef-f058-50f4-a1d1a4cb95e2@averbis.com> Date: Sat, 4 May 2019 11:40:52 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:4IuoI0WvMsrJoTssuP8JMgkD404uliwuenW/wOWLxa6w7tQrJMg couiduSZ0LbaoNrTTRWt8oFP2mEOuu5Kh/iUtDTcm7PrpUugaPBZtVyeqoyzPOYszwjWfTN ocL7PIP6rMEe1z48oB4zHvetEZk7/SJazrvKAksDwGUg9xjA3Z9OwS38kll9+Ya2kNDjF/O 36AOnwGS7JLt3f9o2FFMw== X-UI-Out-Filterresults: notjunk:1;V03:K0:WpRnBMHcJxY=:CftYbJtte4C7+VlbKG+BZ6 N4lJ7cjAYVjSnK/ObO6oQsc7+kMoaHw0JIyPv4DVQB/sAEiwtTj4Y/E6Emza+OWqpHt/FXtMS d+W7bUHdeMTFvr+pDHE5tP6t+hwH08G1A3/heNmlHWWyu59qWoX6Yjoj6TVOG6/YN29tg/6ET YlWCDlDwUVg95tCTZxxidPcaceGKPgw9jMZurZsLc7f+JcKMntVeDfr0px8+7GzJRt5t9evh2 0QTuYyElnv2BNukxTSr3Ewrgl9Rn56IyiIwSNTJ6yyrm2TLc1RFOJ3dSJh9uod/jeEUzG0L1d KGzLrk/+ecdqenC0BTaHyhNrjKk3S4yctMp4TtpyaIADn+5bhzT7HcgLIrfOjkjK5lUP8Pfxy JJG9NCe1mtxhQEpPIGRRZ4S56fSbSYj/Tj3KMHmzLbZvd0z+fV0EkMRJNtzHTjPsUioCfQSlW 5vNTrbZRXx5gyWFybqgJ4D6XSpjoaVsP6uUOa/IoD/lHJKwR8u1HgWO0O575XNixkbn38hXgW pXRgZfvQCf0oVNKlL6FDVm7JTzSuk+ALU0Qrm1VDhoHwnv7TOzjGG3pycUIUo7AQbKobguoKc XvV4W7EETqs1EHZ2/8hF1a/dZ1jbUAaJaWmmP5Ds5qZ8qpUNpPE6/jbrhQ9rnurBuX6ogqVJt yR3B5j5HYmlp7DEDgxYQoA5ggYHHHr5XpoFa+hlmK8gpiXZGa1Uh0Sh8jxTzReudViC766pl7 ZFbtNnbBY2tTRjvW+f3/mpRAaIO3RqamSHzqougJXbH7GBhTG/sAyjd/8XEw5zoIyGNC7Xv6s x9o2ZFn Hi, that mainly depends on the typesystem. Your rule could look something like: w:Word{OR(w.lemma == "arbeiten", w.ct == "gearbeitet")}; Best, Peter Am 03.05.2019 um 18:28 schrieb Nikolai Krot: > Hi Peter, > > Thank you for your prompt reply. > > Speaking about pre-annotation with another engine. Say, I managed to > annotate words of interest and additionally set an attribute, something > like this > > ... gearbeitet... > > Is there a simple way configure the object of matching in ruta rules so > that the rule matches over actual text ("gearbeitet" in our case) or the > value of attribute "lemma" ("arbeiten" in our case)? > That is, match should return True if either of the fields evaluates to True. > This would make some rules simpler. > > Best regards, > Nikolai > > On Fri, May 3, 2019 at 2:03 PM Peter Klügl wrote: > >> Hi, >> >> >> there is/was support for a weighted edit distance in the trie lookup, >> but that functionality was not maintained for many years. >> >> The dictionary lookup functionality in Ruta is overall very limited. >> Normally, one uses an separate analysis engine with extended logic >> (ConceptMapper?) for creating the annotations, which are then later >> reused in rules. >> >> >> Best, >> >> >> Peter >> >> Am 03.05.2019 um 13:16 schrieb Nikolai Krot: >>> Hi all, >>> >>> Is there a possibility to match a word somehow fuzzily in UIMA Ruta >>> language? I am thinking how to overcome problems with typos and OCR >>> mistakes... It is hardly possible to list all possibilities how a word >>> could have been broken. >>> >>> Best regards, >>> Nikolai Krot >>> >> -- >> Dr. Peter Klügl >> R&D Text Mining/Machine Learning >> >> Averbis GmbH >> Salzstr. 15 >> 79098 Freiburg >> Germany >> >> Fon: +49 761 708 394 0 >> Fax: +49 761 708 394 10 >> Email: peter.kluegl@averbis.com >> Web: https://averbis.com >> >> Headquarters: Freiburg im Breisgau >> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 >> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó >> >> -- Peter Klügl R&D Text Mining/Machine Learning Averbis GmbH Salzstr. 15 79098 Freiburg Germany Fon: +49 761 708 394 0 Fax: +49 761 708 394 10 Email: peter.kluegl@averbis.com Web: https://averbis.com Headquarters: Freiburg im Breisgau Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó