Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A58D18721 for ; Wed, 14 Oct 2015 09:05:02 +0000 (UTC) Received: (qmail 52165 invoked by uid 500); 14 Oct 2015 09:04:03 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 52123 invoked by uid 500); 14 Oct 2015 09:04:03 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 52111 invoked by uid 99); 14 Oct 2015 09:04:03 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Oct 2015 09:04:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 64646180A4F for ; Wed, 14 Oct 2015 09:04:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.999 X-Spam-Level: X-Spam-Status: No, score=0.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id HSzDHPHA25hv for ; Wed, 14 Oct 2015 09:04:01 +0000 (UTC) Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.17.10]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 59120439E1 for ; Wed, 14 Oct 2015 09:04:01 +0000 (UTC) Received: from [192.168.11.108] ([132.230.176.14]) by mrelayeu.kundenserver.de (mreue102) with ESMTPSA (Nemesis) id 0McWQK-1a3xLm1Oeu-00Hfuy for ; Wed, 14 Oct 2015 11:03:54 +0200 From: =?UTF-8?Q?Peter_Kl=c3=bcgl?= Subject: UIMA Ruta: assign distant annotations to features X-Enigmail-Draft-Status: N1110 To: user@uima.apache.org Message-ID: <561E1AAE.9090907@averbis.com> Date: Wed, 14 Oct 2015 11:04:46 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:X837WjHniYS8VhByWlmKjyOjDi3NKtfYT7suSFbwDup0vPjmzkQ caQt+MLI8N1dMTRNKq8H/bvuAzUeP9854dNBQm//x4/6mX509YgvoHAMrTe2h/UtM4/XUFq ZP0YWbk7pJyYT5WtSl8qPcxyp+7scFyuDDx2cciQUrR+JsAIGRCl1jGiPUx8auBfk274P7N DQ0Q7tE22HjwxpcSaXnPg== X-UI-Out-Filterresults: notjunk:1;V01:K0:CIb6BD1L4FY=:0+4R8XrirBDJvEXY8v90IG CSb03OEwVgXDLL3JvcG5VeXfpsAk2WxUgSv76DQpFKy6EfVwpd5ZPu776HMBrdVMieipYtnCS DM507JdzGGa94b/gjhmB95prxH9N7RDF7KP46r4z11ofSLY2JLdsR7pVk2oS5E+eZ5eQhr8l+ 6486Zh6jlU6xpdHx1q41ADicGpPZinrSINXggWkkIk6RHPKXUl0K8u6cLWcJcGVtcptuYA1vc TztceFr3nC6dnEmt6kgp+wwzDyXJ7dHAwFKCLoV7TthUR/lkR0sGmuLJ31E6/Yw5Lhmx4DWwt SYCLi0nOyTigwtD8y+cctYY59UM9GuG7V6nWNla/VO+0EdxpwkOsEwlG9b/F0nMqKP9vtStBA v0qhSoiYFsb0MbPIJbYt/fCKq57sIdqy6wWsjID0OgBdhSosOsQDj5hIZq57/Qqfxd8Iubmit xZG1ZXyNs1vQx9fppoqzp9jv+LqO4HGAKZ10Z7O0W/UUW+rRvt6e1+ywB1glUsOByELYg9PAU BwiWJLz0sgZg6xRV7yb+Y67V2W0u89Ilx7XgUZtymeE+L2MjzIkLUhpfY15lNpVcqNCGD3vl6 V3my0Nl40LD/d6/Xsh8ybMDZnTMwaw67mR8xsHJmCFMzQH+pYmtCnYSpIfvOzN2aE6qSo15bz ba+pKNS8884wiUcgo3sxon6XIua2lQB2T1Q4sJSG9qpPjhFPWHhNyTaqEIB/d1csJ3KRLtnKe G8qrjZb05H3mJCoP Hi, there were some questions about assignments of annotations to features lately. This can by quite annoying right now. The problem is caused by two missing language elements: variables for storing annotations, which can be used to refer to annotation across rules. Explicit references to match contexts (rule elements), which facilitate the assignment within rules (local variables). I hope I will be able to add both language elements in UIMA Ruta 2.4.0. Here a short description how I solve these use cases in UIMA Ruta right now. Lets assume we some exemplary document: Some text A Some text More text B Some text B A and B shall be annotations of the type A and B. The goal is to create an annotation of type C for each annotation B with the same offsets as B, and to assign the annotation A to a feature "a" of the annotation C. In this simple case, this could be done with GATHER: A # @B{-> GATHER(C, "a"=1)}; This rule starts to match on an annotation of type B (rule element 3). Then, it searches for the next annotation of the type A in front of it (rule element 1). If this is worked, then the GATHER action creates a new annotation of type C with the offsets of the current annotation B (rule element 3) and assigns the annotation matched by rule element 1 to the feature "a" of the annotation C. Then, the rule continues with the next annotations of type B. Thus, we get an annotation C for each annotation B if there is an annotation A somewhere before. Unfortunately, this is often not possible, e.g., because we need to assign additional values to other features of the annotation C, which is not supported by GATHER. Or we cannot directly use type A because the actual annotation we want to assign is stored in a feature of A. In order to overcome the missing languages elements that could be used to solve this problem, I normally introduce a projection type: an annotation that stores an annotation but is placed somewhere else in the document. DECLARE Projection (Annotation value); A # @B{-> GATHER(Projection, "a"=1)}; B{-> CREATE(C, "x" = "String"), C.a=Projection.value}; The first line declares a new type with a feature "value". The first rule (second line) connects the annotation A with each position of B. Then, the last rule creates the annotation C, stores "String" in the feature "x" and assigns the annotation stored in the feature "value" of an annotation Projection, which has the same offsets as B (the match context) in the feature "a" of the annotation C. This approach does not fit all use cases, but it can be modified to do so in most cases, e.g., the CREATE and feature assignment can be separated in order to use them in different BLOCKs, or a feature of A instead of A itself is stored in Projection. This problem can also be solved with other approaches in UIMA Ruta. Feel free to share your approaches with us. If anyone thinks that this mail was useful, then I would add it to the documentation. Best, Peter