uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: get all the annotations located between two positions
Date Thu, 12 Jul 2007 07:47:25 GMT
To be a bit more explicit, here's some code that will determine how
many tokens the longest sentence in the document contains.  It's a
silly example, but it illustrates the concept.  Maybe this should go
in the docs.  Note: I have not actually run this code, it may not
work immediately ;-)

    CAS cas = ...;
    Type sentenceType = cas.getTypeSystem().getType("yourSentenceTypeName");
    Type tokenType = cas.getTypeSystem().getType("yourTokenTypeName");
    FSIterator sentenceIt = cas.getAnnotationIndex(sentenceType).iterator();
    AnnotationIndex tokenIndex = cas.getAnnotationIndex(tokenType);
    FSIterator tokenIt;
    int maxLen = 0;
    int currentLen;
    for (sentenceIt.moveToFirst(); sentenceIt.isValid(); sentenceIt.moveToNext()) {
      tokenIt = tokenIndex.subiterator((AnnotationFS) sentenceIt.get());
      currentLen = 0;
      for (tokenIt.moveToFirst(); tokenIt.isValid(); tokenIt.moveToNext()) {
	++currentLen;
      }
      maxLen = ((maxLen < currentLen) ? currentLen : maxLen);
    }
    System.out.println("Longest sentence contains " + maxLen + " tokens.");

--Thilo

Marshall Schor wrote:
> Did you consider using subIterators?  These are (briefly) described in
> section 4.7.4 of the Apache UIMA Reference book, and may include exactly
> what you're trying to get at - an interator over elements that are
> "contained" in the span of other elements.
> 
> -Marshall
> 
> Julien Nioche wrote:
>> Hi,
>>
>> Sorry if someone already asked the question.
>> Is there a direct way to obtain from a Cas all the annotations of a
>> given type located between two positions in the text? Something like
>> getContained(String type,int start,int end)?
>> I am trying to get all the Tokens contained within a specific
>> Sentence. I have used iterators for doing that and compared the offset
>> with those of the Sentence but it is a bit tedious. Have I missed
>> something obvious?
>>
>> Thanks
>>
>> Julien
>>
>>

Mime
View raw message