From user-return-8108-archive-asf-public=cust-asf.ponee.io@uima.apache.org Fri May 3 12:03:50 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EDD1518064D for ; Fri, 3 May 2019 14:03:49 +0200 (CEST) Received: (qmail 8314 invoked by uid 500); 3 May 2019 12:03:48 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 8303 invoked by uid 99); 3 May 2019 12:03:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 May 2019 12:03:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C6B23C2CD3 for ; Fri, 3 May 2019 12:03:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1 X-Spam-Level: * X-Spam-Status: No, score=1 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id k2RcVXVOyJLq for ; Fri, 3 May 2019 12:03:46 +0000 (UTC) Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.135]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0734D5FD9F for ; Fri, 3 May 2019 11:56:07 +0000 (UTC) Received: from [192.168.11.106] ([212.60.243.34]) by mrelayeu.kundenserver.de (mreue010 [212.227.15.167]) with ESMTPSA (Nemesis) id 1N332D-1gbfNj3K73-013RRc for ; Fri, 03 May 2019 13:56:06 +0200 Subject: Re: fuzzy matching possible? To: user@uima.apache.org References: From: =?UTF-8?Q?Peter_Kl=c3=bcgl?= Openpgp: preference=signencrypt Message-ID: Date: Fri, 3 May 2019 13:56:06 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:Li1XSPBIxp7VSfKUUV/8nuozNmLu00XTY+6okRviK/bVVRGJ1l4 O+lx0VDRYWbjIfKFkGPjRiSkbst9qfBytwP7E5aPVmJfleK3MwSGO1YVfzXw5Va3jy19ZE7 vKhHtGlvmv2sUfmuzWsky5ljOqZYWvwfDStIcO+wrmYCLLaOHqmYVzKqR1mQRMxe+naPFV5 h9tRLbX4ZiKwgc2DpoOkg== X-UI-Out-Filterresults: notjunk:1;V03:K0:f3MthFnifGk=:8cc2AdeTP0FiGCf0FYeba/ EGVyrc9J+H3/FY4fhPIyfcZZL1u666d50pD7/jXIQ9Pxq1IGI3dXUlqNLBayF9mfwH5BzJX1j gcKKk2lbRotVG++7L2yXj7GuLSjbtzCTw2eZ48JPPqxl/T2JJ2JA5w6vZ5/k1RkU7ZP5HIWT3 HCdqavpxowwdKewfzmc1SJG6Z36rFq4MxoPV3rwyajGKuU0xNBIk5s3ZsAvC/V51uXgZbdmKD zqOfMPjezQPrx6iDtgWPIKunlC2UrujsOjOr5f3r/ziV/MnUFvihoG5RNyMOLrdgEbgnURm/3 bXCDwV3yMPK39Rg4z9c/0umANFIviBfurcEmahff1h1p+CczERELo3tgJxtUjR9nJSExCOO/A pu6buVKxKX5oL7YsoqwnhN8cwRMaOIOM7eKg5u07eK/NInzqhMJynYjxXPc/JexdpU1YPvS17 NljqkKH8jyYVVAtgwRMcue/RJiFLACSdAO8uqRgJkIgO58bEWJaxDTrBys5mf7qZr8qD/Pz/n 3Y+DsvqeHjpCyIi5TsZBYqb7AKGFOFt1YuEQKo1Q+WYDnuPDfZ8DH4WLTZMecZAFIdsbTH+EG OZVTxLFZTcqI2ooqEpqF9VJQsoJujhDk3D9oKORN4dfYNv2bn98YP5dfO2R5Vx48Aj8YZ7i6K nWPwn80W43rX7pOApIK0KfR87FqdoIj/dRAS4KkVvKQ7ysGoQWhVQlP93Qo3k6CepnfpZlah0 chPILXGEFSReKsv8xidh13wA8yygu+1aB9LyEMqb43VlbFk1nsm339M8kUYY1A4OCbSGfR4Z+ 9jZqRb9 Hi, there is/was support for a weighted edit distance in the trie lookup, but that functionality was not maintained for many years. The dictionary lookup functionality in Ruta is overall very limited. Normally, one uses an separate analysis engine with extended logic (ConceptMapper?) for creating the annotations, which are then later reused in rules. Best, Peter Am 03.05.2019 um 13:16 schrieb Nikolai Krot: > Hi all, > > Is there a possibility to match a word somehow fuzzily in UIMA Ruta > language? I am thinking how to overcome problems with typos and OCR > mistakes... It is hardly possible to list all possibilities how a word > could have been broken. > > Best regards, > Nikolai Krot > -- Dr. Peter Klügl R&D Text Mining/Machine Learning Averbis GmbH Salzstr. 15 79098 Freiburg Germany Fon: +49 761 708 394 0 Fax: +49 761 708 394 10 Email: peter.kluegl@averbis.com Web: https://averbis.com Headquarters: Freiburg im Breisgau Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó