Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 54C49200BC7 for ; Fri, 25 Nov 2016 23:49:20 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 536BB160B1C; Fri, 25 Nov 2016 22:49:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 75917160AFA for ; Fri, 25 Nov 2016 23:49:19 +0100 (CET) Received: (qmail 49958 invoked by uid 500); 25 Nov 2016 22:49:13 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 49942 invoked by uid 99); 25 Nov 2016 22:49:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Nov 2016 22:49:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B6CB11812DF for ; Fri, 25 Nov 2016 22:49:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.88 X-Spam-Level: * X-Spam-Status: No, score=1.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qYcFX1kxOmx2 for ; Fri, 25 Nov 2016 22:49:10 +0000 (UTC) Received: from mail-ua0-f179.google.com (mail-ua0-f179.google.com [209.85.217.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CE79D5F5A2 for ; Fri, 25 Nov 2016 22:49:09 +0000 (UTC) Received: by mail-ua0-f179.google.com with SMTP id 12so89710850uas.2 for ; Fri, 25 Nov 2016 14:49:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Z9rrPBfhfXiwBRbu6EtrcHACME6Pf9KBXFgD+4zV1Sw=; b=p1sbiwhxf0+Pee/W3nGOb4BwgvKmtew+37VBxac0j7IjDSBYRuQexlzZvamSCX5D+O uHhBnBxmneVOjQMATJejFvN8iElBOPqdKpFkgnErYK7eM/mD9F0gadK1MHNzyqcH/5eJ MItHvh1t7Ckc6dSn1ZKxir1b/Bbs6qxxvZaOVVAvhk16pVnZQeVlRCTpy6tUoCaeicGQ s6qoO6GIjY0tyQmu8+aE42netU1Ezwq0MiCNjdEW2PoKbBIncfFdpx4JwroFg2xrxU2b uf+mNkSyeBiQCj1ut0KuG/prhL0zmr2Wef36m5q+V6QgpbU6rsM9WhJ/t29YFOnrBgjg Pz4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Z9rrPBfhfXiwBRbu6EtrcHACME6Pf9KBXFgD+4zV1Sw=; b=V4BfuKUchu/SQrvzH+kfefHUXSXqi8zlOzOCci54Hwda2yJELp+lWA9vT89U8GOIbT W+QLJrv/ki0taBmXLgMdewWndDrPwR/KOBcMKkTXzvvuBZAVtxds0U3ggzj8bNommb1A fm2FGSNZwKa+Z3/F/mCINJl2w+u8fgmKJd65qWhOVqLqPxmsT8iYO0PJsS9IMl38pFtd sFn0cCBvCVnyqFmE4UbnZTvz2qhRRVoECVsGmF9Kp+p+4wdFtbdi4R4ou8TECawo/7Qr Mkl/GCWPeMDr3sDUuD2LL7TS8JpnoicKd8SvSWjsrIbG2FNcVeX9BR4DQf/izOERKik5 6sWA== X-Gm-Message-State: AKaTC00guGobpO2SjdmwH6VDGEzEs6k2ZpF0G90anYh7cplwcvWjpq2W8yYpIIKsNRU8ejcOSheon9cskT1ikA== X-Received: by 10.176.5.69 with SMTP id 63mr6171605uax.71.1480114115901; Fri, 25 Nov 2016 14:48:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.3.112 with HTTP; Fri, 25 Nov 2016 14:48:15 -0800 (PST) In-Reply-To: References: <18e40ab0-5c39-b54a-fb88-7f4708173cb8@schor.com> From: William Colen Date: Fri, 25 Nov 2016 20:48:15 -0200 Message-ID: Subject: Re: Sorting overlapping annotation of same type using UIMAFIT To: user@uima.apache.org Content-Type: multipart/alternative; boundary=94eb2c124cfa1afd98054227ec59 archived-at: Fri, 25 Nov 2016 22:49:20 -0000 --94eb2c124cfa1afd98054227ec59 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Great! Thank you! 2016-11-23 12:33 GMT-02:00 Marshall Schor : > UIMA allows you to define custom indexes. So you can define a new sorted > index > (for example, let's name it "nameOfYourNewIndex") that is like the > annotator > index, except that its keys are a) the begin feature, ascending, 2) the e= nd > feature, descending, and 3) the special extra feature you have to sort > otherwise > equal annotations. You would define this index to be over the most > specific > type that is the type or supertype of all Feature Structures you want thi= s > index > to apply to (let's say you have a JCas class for this, called > JCasClassOfTheType). > > Then you can use uimaFIT's your own index (see docs), that include your > extra > feature. Then you would use a form such as this: > > // get the index instance from the JCas > FSIndex index =3D jcas.getIndex("nameOfYourNewIndex", > JCasClassOfTheType); > > // get an iterator from the index > FSIterator iterator =3D index.iterator(); > > With this, there is no need to have the user first collect all the > instances, > and then sort them; UIMA does this for you. > > Hope this helps! -Marshall > > > On 11/21/2016 8:05 PM, William Colen wrote: > > Thank you, Marshall. > > What if they are of the same type? > > The workaround for me was to add a feature I can store a integer which = I > > use to sort the annotations. It is not a good approach because the user > > will need to remember to sort it before using. > > > > Thank you > > William > > > > 2016-11-21 20:10 GMT-02:00 Marshall Schor : > > > >> The select form you're using iterates using UIMA's built-in Annotation > >> index. > >> This index is sorting the annotations based on 3 criteria: > >> > >> 1) the begin (ascending order) > >> > >> 2) the end (descending order) > >> > >> 3) the type priority > >> > >> You can use the 3rd criterion to set a preference ordering among two > >> annotations > >> of different types, which have the same begin / end. > >> You specify the type priorities as part of Analysis Engine metadata, s= ee > >> http://uima.apache.org/d/uimaj-current/references.html# > >> ugr.ref.xml.component_descriptor.aes.primitive > >> > >> -Marshall > >> > >> On 11/20/2016 9:52 PM, William Colen wrote: > >>> Hi, > >>> > >>> In Portuguese we have contractions, that are words composed by, for > >>> example, a preposition + article, pronoun or an adverb. > >>> > >>> Example: > >>> > >>> N=C3=B3s acredit=C3=A1vamos nele. (We believed him.) > >>> > >>> Where "nele" can be divided into "em" + "ele". (in + him) > >>> > >>> To properly analyze this, I created two token annotation with the sam= e > >>> begin and end, but the first I associated with the POS Tag prepositio= n, > >> and > >>> the second pronoun. > >>> > >>> This is especially important when we are doing chunking, because the > >> first > >>> token will be part of a prepositional phrase, while the second of a > >> nominal > >>> phrase. > >>> > >>> How can I guarantee that when I call UIMAFit JCasUtil.select I will g= et > >> the > >>> tokens ordered, first the preposition, second the pronoun? > >>> > >>> Thank you, > >>> William > >>> > > --94eb2c124cfa1afd98054227ec59--