Thilo and Marshall,

Thanks for sharing the tip. Indeed it would be a good idea to add this little example to the documentation.

A quick comment about the Iterator methods. I had a problem with the following piece of code:

while (wordFormIterator.hasNext()){
WordForm wf = (WordForm);
if (wf.getBegin()==token.getBegin() && wf.getEnd()==token.getEnd()){
else {
//  move back
 return liste;

The last element of the iterator was never accessible because hasNext() returned false despite the fact that there WAS an element left in there. moveToPrevious had been previously called on this iterator.

Should not hasNext() return true even if the cursor has been moved forward or backward within the iterator? Or is the use of the legacy methods (hasNext(), next()) incompatible with the moveTo* methods?


To be a bit more explicit, here's some code that will determine how
many tokens the longest sentence in the document contains.  It's a
silly example, but it illustrates the concept.  Maybe this should go
in the docs.  Note: I have not actually run this code, it may not
work immediately ;-)

    CAS cas = ...;
    Type sentenceType = cas.getTypeSystem().getType("yourSentenceTypeName");
    Type tokenType = cas.getTypeSystem().getType("yourTokenTypeName");
    FSIterator sentenceIt = cas.getAnnotationIndex(sentenceType).iterator();
    AnnotationIndex tokenIndex = cas.getAnnotationIndex(tokenType);
    FSIterator tokenIt;
    int maxLen = 0;
    int currentLen;
    for (sentenceIt.moveToFirst(); sentenceIt.isValid(); sentenceIt.moveToNext()) {
      tokenIt = tokenIndex.subiterator((AnnotationFS) sentenceIt.get());
      currentLen = 0;
      for (tokenIt.moveToFirst(); tokenIt.isValid(); tokenIt.moveToNext()) {
      maxLen = ((maxLen < currentLen) ? currentLen : maxLen);
    System.out.println("Longest sentence contains " + maxLen + " tokens.");


Marshall Schor wrote:
Did you consider using subIterators?  These are (briefly) described in
section 4.7.4 of the Apache UIMA Reference book, and may include exactly
what you're trying to get at - an interator over elements that are
"contained" in the span of other elements.


Julien Nioche wrote:

Sorry if someone already asked the question.
Is there a direct way to obtain from a Cas all the annotations of a
given type located between two positions in the text? Something like
getContained(String type,int start,int end)?
I am trying to get all the Tokens contained within a specific
Sentence. I have used iterators for doing that and compared the offset
with those of the Sentence but it is a bit tedious. Have I missed
something obvious?