Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 72737 invoked from network); 12 Oct 2007 18:21:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Oct 2007 18:21:59 -0000 Received: (qmail 19214 invoked by uid 500); 12 Oct 2007 18:21:40 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 19195 invoked by uid 500); 12 Oct 2007 18:21:39 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 19174 invoked by uid 99); 12 Oct 2007 18:21:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2007 11:21:39 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [67.18.27.130] (HELO gator74.hostgator.com) (67.18.27.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2007 18:21:42 +0000 Received: from yktgi01e0-s5.watson.ibm.com ([129.34.20.19]:30337 helo=[9.2.34.86]) by gator74.hostgator.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1IgP8T-0000L8-Pz for uima-user@incubator.apache.org; Fri, 12 Oct 2007 13:21:21 -0500 Message-ID: <470FBAF9.1050409@schor.com> Date: Fri, 12 Oct 2007 14:20:41 -0400 From: Marshall Schor User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Re: Iterators in CAS References: <470F3055.4070408@coling-uni-jena.de> <470F4131.4040708@michael-baessler.de> In-Reply-To: <470F4131.4040708@michael-baessler.de> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator74.hostgator.com X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - schor.com X-Virus-Checked: Checked by ClamAV on apache.org Here are some other approaches. Currently, you have an approach where you iterate over [tokens] and for each one, want to see if there is a containing [sentence]. If your logic permits, you could iterate over [sentences], and for each, subiterate over the [tokens]; each token found would, of course, be in the span of the sentence you were on at that point. This takes no more storage, and would be faster than the method of searching using findCoverFS. But perhaps the program's logic doesn't permit this. Another approach would be to do the above, once, somewhere in your annotator chain, and save the result as an additional field in the [token] - a reference to the [sentence] annotation that contains it. Then it's just a matter of dereferencing that field to find the containing sentence. Of course, this takes an additional 4-byte slot per token, to hold the back reference. One last point - the code below does an "indexed" search over all sentences looking for the containing sentence, every time you want to find it. This is slower than the above 2 methods, although our implementation is pretty fast (I think it has a log(n) kind of performance - n being the number of things in the index). -Marshall Schor Michael Baessler wrote: > Hi Ekaterina, > > I had the similar problem when implementing the > RegularExpressionAnnotator - how to find the covering annotation of a > certain type for my current FS. > > The code is checked in to the SVN at: > http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/RegularExpressionAnnotator/src/main/java/org/apache/uima/annotator/regex/impl/RegExAnnotator.java > > > The method is called: > findCoverFS(CAS aCAS, AnnotationFS annot, Type coverFsType) > > If this is exactly what you need, we may can discuss to move this to > the core framework API. > > Hope that helps. > > -- Michael > > Ekaterina Buyko wrote: >> Hi all! >> >> In UIMA 2.1 it is possible to create a sub-iterator in order to >> iterate over annotations which are within the begin-end span of the >> selected type. >> >> For example: >> >> AnnotationIndex sentenceIndex = (AnnotationIndex) aJCas >> .getJFSIndexRepository().getAnnotationIndex(Sentence.type); >> >> AnnotationIndex tokenIndex = (AnnotationIndex) aJCas >> .getJFSIndexRepository().getAnnotationIndex(Token.type); >> >> // iterate over Sentences >> FSIterator sentenceIterator = sentenceIndex.iterator(); >> while (sentenceIterator.hasNext()) { >> >> Sentence sentence = (Sentence) sentenceIterator.next(); >> >> // iterate over Tokens >> FSIterator tokenIterator = tokenIndex.subiterator(sentence); >> >> >> I would like to have a more extended functionality. I need to know >> the annotations which are in the span of begin-end of the selected >> annotation type. These annotations can overlap the span of the >> selected type. >> >> For example noun phrases. If I iterate over tokens, I would like to >> know, if this token is inside a noun phrase or not. Now, I am working >> with Hashtables. But I am looking for an other solution. >> >> How could I solve this problem? >> >> Bets regards >> >> Ekaterina >> >> >> >> > > >