Return-Path: Delivered-To: apmail-uima-user-archive@www.apache.org Received: (qmail 86619 invoked from network); 27 Apr 2010 19:58:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Apr 2010 19:58:32 -0000 Received: (qmail 89153 invoked by uid 500); 27 Apr 2010 19:58:32 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 89126 invoked by uid 500); 27 Apr 2010 19:58:32 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 89118 invoked by uid 99); 27 Apr 2010 19:58:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Apr 2010 19:58:31 +0000 X-ASF-Spam-Status: No, hits=-0.3 required=10.0 tests=AWL,FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eaepstein@gmail.com designates 74.125.82.47 as permitted sender) Received: from [74.125.82.47] (HELO mail-ww0-f47.google.com) (74.125.82.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Apr 2010 19:58:26 +0000 Received: by wwd20 with SMTP id 20so162504wwd.6 for ; Tue, 27 Apr 2010 12:58:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=zEiB8j3U7ZI+hZGJsoNxVew3ktSIVukmPfx+f3Q9C28=; b=FTdpPqgS/zcA+WeVUWRd9qcK4cpKHCch/Gkb+jz20NCzqICSeGEVWostIftLbkDqsP mNL/YQ3aV2JMXuLtLFRrrqIUEHZb0MtZnFlXivWaZopWjI0Q5WKS9BTrr1Kw79JZfldM OmMuBEDyhXetOCkf808NFiZ9zFcgMVuvwqUV0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=t+/xpMYJqGBZokYcv1nyjnjTVeJX2df3m4QQlbpTb+owhMsrQ9vYWH75aSGOE3IpKA dOx3udR17T+Dm9kXIIlIoWXavkI3NiPh39KX25/iLegTQfTQYoxlmNQZ6kpitgxYl8Ii F3C2FHvlnkT9vZCil5BAQFs4Ohv2EPzDCssFA= MIME-Version: 1.0 Received: by 10.216.157.4 with SMTP id n4mr1921810wek.53.1272398285258; Tue, 27 Apr 2010 12:58:05 -0700 (PDT) Received: by 10.216.165.208 with HTTP; Tue, 27 Apr 2010 12:58:05 -0700 (PDT) In-Reply-To: <4BD6FBD5.1090900@gmx.de> References: <4BD5836A.50603@schor.com> <4BD6FBD5.1090900@gmx.de> Date: Tue, 27 Apr 2010 15:58:05 -0400 Message-ID: Subject: Re: Restrictions on sofa data array From: Eddie Epstein To: user@uima.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Apr 27, 2010 at 10:59 AM, Thilo Goetz wrote: > My understanding is that he wants the tokens as primitives, > not the characters. =A0Annotation offsets could then be token > offsets, not character offsets. =A0That's perfectly reasonable > for some tasks. =A0We usually create annotations with the start > offset being the start of some token, and the end offset the > end of some token. =A0Then it's hard to find the tokens that > are "covered" by the annotation, which is why we have > subiterators, which are not super efficient. =A0And so on. > I like the idea, but I have no idea how compatible it is with > UIMA's idea of views and sofas. A StringArrayFS can be used as Sofa data. Moreover, a new annotation type derived from AnnotationBase can be used to point into the StringArray, and if using JCas it could have a getCoveredText() method or other functional capabilities. Thanks for explaining the scenario! Eddie