Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF73010D3D for ; Sun, 27 Apr 2014 22:11:00 +0000 (UTC) Received: (qmail 36922 invoked by uid 500); 27 Apr 2014 22:10:59 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 36796 invoked by uid 500); 27 Apr 2014 22:10:58 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 36788 invoked by uid 99); 27 Apr 2014 22:10:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Apr 2014 22:10:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [84.242.80.195] (HELO machine.or.cz) (84.242.80.195) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Apr 2014 22:10:53 +0000 Received: by machine.or.cz (Postfix, from userid 2001) id 73F2148203A6; Mon, 28 Apr 2014 00:10:30 +0200 (CEST) Date: Mon, 28 Apr 2014 00:10:30 +0200 From: Petr Baudis To: user@uima.apache.org Subject: Copying a CAS subset with offset correction Message-ID: <20140427221029.GI6156@machine.or.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Checked: Checked by ClamAV on apache.org Hi! I'm trying to figure out how to reliably do deep copies from one CAS to another where the sofa of the target CAS is a subset of the source CAS. E.g. copying from the previous sentence to "do deep copies from one CAS to another". One approach is to simply do something like int ofs = subCasSpan.getBegin(); CasCopier copier = new CasCopier(srcCas.getCas(), dstCas.getCas()); for (Annotation a : JCasUtil.selectCovered(Annotation.class, subCasSpan)) { Annotation a2 = (Annotation) copier.copyFs(a); a2.setBegin(a2.getBegin() - ofs); a2.setEnd(a2.getEnd() - ofs); a2.addToIndexes(); } However, the problem is when the featureset contains references to other featuresets; if these are outside the span, their offsets will not get modifies and these "hidden" featuresets will remain referenced but become nonsensical and misleading, instead of ideally the featuresets not being copied and replaced by null references. I don't think this is something that's easily achievable right now? (The possible annotation types are an open set, manual per-annotation handling of references is not feasible in my case.) I think the most reasonable solution would be to introduce a way to specify an offset span for the CasCopier (or a subclass), with annotations dropped if they are outside of the offset span? Thanks, Petr "Pasky" Baudis