Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 60330 invoked from network); 5 Jan 2008 07:53:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Jan 2008 07:53:47 -0000 Received: (qmail 21737 invoked by uid 500); 5 Jan 2008 07:53:36 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 21717 invoked by uid 500); 5 Jan 2008 07:53:36 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 21708 invoked by uid 99); 5 Jan 2008 07:53:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jan 2008 23:53:36 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of twgoetz@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 05 Jan 2008 07:53:13 +0000 Received: (qmail invoked by alias); 05 Jan 2008 07:53:16 -0000 Received: from p5B20644F.dip.t-dialin.net (EHLO [192.168.0.4]) [91.32.100.79] by mail.gmx.net (mp036) with SMTP; 05 Jan 2008 08:53:16 +0100 X-Authenticated: #25330878 X-Provags-ID: V01U2FsdGVkX19l4FJZO6dlsMEy2F4qibk0+DXbu197gZ/Yd+HXTy JkIrZSv9WfvFr9 Message-ID: <477F376D.906@gmx.de> Date: Sat, 05 Jan 2008 08:53:17 +0100 From: Thilo Goetz User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Re: Non-matching filter? References: <1199473928.4218.16.camel@jd-linux.ibsys.com> In-Reply-To: <1199473928.4218.16.camel@jd-linux.ibsys.com> X-Enigmail-Version: 0.95.6 Content-Type: multipart/mixed; boundary="------------070007060301000101050801" X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org --------------070007060301000101050801 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit jonathan doklovic wrote: > Hi, > > I have been looking at Contraints and Filters. > I understand how to use them to get an iterator that matches a certain > type, but I want to do the opposite.... > > I have annotations for 3 types: City, State, and Location (where > location contains a city and a state) > > Now I want to create a filtered iterator that basically returns any city > annotations that are NOT already within a Location annotation. > > Is there any way to do this? > > Thanks, > > - Jonathan Jonathan, first, let me make sure I understand what it is that you need. So for example, for a sentence "the exhibition will visit New York, NY, and Paris, France" you would might have city annotations for "New York" and "Paris", a state annotation for "NY", and a location annotation for "New York, NY". You would want to find the city annotation for Paris, but not the one for New York. If this is what you're trying to do, I don't know of an easy answer. The fastest method would involve iterating over locations and cities in parallel, but that gets really messy and there are a ton of boundary cases to consider. So here's something that's a bit less efficient, but still ok performance-wise. Unfortunately, it still involves some relatively advanced use of CAS iterators. Please note: I just typed this in. It compiles, but has never run. If you can't get it to work, I'll need a real example ;-) And if this is not the problem you're trying to solve, also let us know. I'll stick the method here in the text, and the complete file in an attachment. HTH, Thilo public List findOrphanedCities(CAS cas) { // Obtain type system info; replace with correct type names Type cityType = cas.getTypeSystem().getType("city"); Type locationType = cas.getTypeSystem().getType("location"); Feature beginFeat = cas.getTypeSystem().getFeatureByFullName(CAS.FEATURE_FULL_NAME_BEGIN); Feature endFeat = cas.getTypeSystem().getFeatureByFullName(CAS.FEATURE_FULL_NAME_END); // Create an empty location annotation to position iterator AnnotationFS locationSearch = cas.createAnnotation(cityType, 0, 0); // Obtain city and annotation iterators FSIterator cityIterator = cas.getAnnotationIndex(cityType).iterator(); FSIterator locationIterator = cas.getAnnotationIndex(locationType).iterator(); // Result list List list = new ArrayList(); // Iterate over all cities and collect those that are not covered by a location for (cityIterator.moveToFirst(); cityIterator.isValid(); cityIterator.moveToNext()) { AnnotationFS city = (AnnotationFS) cityIterator.get(); // Set the search location to the position of the current city locationSearch.setIntValue(beginFeat, city.getBegin()); locationSearch.setIntValue(endFeat, city.getEnd()); // Set the location iterator to that location, if it exists locationIterator.moveTo(locationSearch); // Check that the iterator is valid, and that the location it points to covers the city if (locationIterator.isValid()) { AnnotationFS loc = (AnnotationFS) locationIterator.get(); if ((loc.getBegin() <= city.getBegin()) && (loc.getEnd() >= city.getEnd())) { list.add(city); } } } return list; } --------------070007060301000101050801 Content-Type: text/plain; name="CityFinder.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="CityFinder.java" /* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ package org.apache.uima.test; import java.util.ArrayList; import java.util.List; import org.apache.uima.cas.CAS; import org.apache.uima.cas.FSIterator; import org.apache.uima.cas.Feature; import org.apache.uima.cas.Type; import org.apache.uima.cas.text.AnnotationFS; /** * TODO: Create type commment. */ public class CityFinder { public List findOrphanedCities(CAS cas) { // Obtain type system info; replace with correct type names Type cityType = cas.getTypeSystem().getType("city"); Type locationType = cas.getTypeSystem().getType("location"); Feature beginFeat = cas.getTypeSystem().getFeatureByFullName(CAS.FEATURE_FULL_NAME_BEGIN); Feature endFeat = cas.getTypeSystem().getFeatureByFullName(CAS.FEATURE_FULL_NAME_END); // Create an empty location annotation to position iterator AnnotationFS locationSearch = cas.createAnnotation(cityType, 0, 0); // Obtain city and annotation iterators FSIterator cityIterator = cas.getAnnotationIndex(cityType).iterator(); FSIterator locationIterator = cas.getAnnotationIndex(locationType).iterator(); // Result list List list = new ArrayList(); // Iterate over all cities and collect those that are not covered by a location for (cityIterator.moveToFirst(); cityIterator.isValid(); cityIterator.moveToNext()) { AnnotationFS city = (AnnotationFS) cityIterator.get(); // Set the search location to the position of the current city locationSearch.setIntValue(beginFeat, city.getBegin()); locationSearch.setIntValue(endFeat, city.getEnd()); // Set the location iterator to that location, if it exists locationIterator.moveTo(locationSearch); // Check that the iterator is valid, and that the location it points to covers the city if (locationIterator.isValid()) { AnnotationFS loc = (AnnotationFS) locationIterator.get(); if ((loc.getBegin() <= city.getBegin()) && (loc.getEnd() >= city.getEnd())) { list.add(city); } } } return list; } } --------------070007060301000101050801--