uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <J.Nio...@dcs.shef.ac.uk>
Subject Iterators: problem when using standard methods in combination with moveTo*
Date Thu, 12 Jul 2007 11:00:01 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<body bgcolor="#ffffff" text="#000000">
Thilo and Marshall,<br>
Thanks for sharing the tip. Indeed it would be a good idea to add this
little example to the documentation.<br>
A quick comment about the Iterator methods. I had a problem with the
following piece of code:<br>
<i>while (wordFormIterator.hasNext()){<br>
WordForm wf = (WordForm)wordFormIterator.next();<br>
if (wf.getBegin()==token.getBegin() &amp;&amp;
else {<br>
//&nbsp; move back<br>
&nbsp;return liste;<br>
The last element of the iterator was never accessible because <i>hasNext()</i>
returned false despite the fact that there WAS an element left in
there. <i>moveToPrevious </i>had been previously called on this
Should not <i>hasNext() </i>return true even if the cursor has been
moved forward or backward within the iterator? Or is the use of the
legacy methods (hasNext(), next()) incompatible with the <i>moveTo* </i>methods?<br>
<blockquote cite="mid4695DC8D.6040306@gmx.de" type="cite">
  <pre wrap="">To be a bit more explicit, here's some code that will determine how
many tokens the longest sentence in the document contains.  It's a
silly example, but it illustrates the concept.  Maybe this should go
in the docs.  Note: I have not actually run this code, it may not
work immediately ;-)

    CAS cas = ...;
    Type sentenceType = cas.getTypeSystem().getType("yourSentenceTypeName");
    Type tokenType = cas.getTypeSystem().getType("yourTokenTypeName");
    FSIterator sentenceIt = cas.getAnnotationIndex(sentenceType).iterator();
    AnnotationIndex tokenIndex = cas.getAnnotationIndex(tokenType);
    FSIterator tokenIt;
    int maxLen = 0;
    int currentLen;
    for (sentenceIt.moveToFirst(); sentenceIt.isValid(); sentenceIt.moveToNext()) {
      tokenIt = tokenIndex.subiterator((AnnotationFS) sentenceIt.get());
      currentLen = 0;
      for (tokenIt.moveToFirst(); tokenIt.isValid(); tokenIt.moveToNext()) {
      maxLen = ((maxLen &lt; currentLen) ? currentLen : maxLen);
    System.out.println("Longest sentence contains " + maxLen + " tokens.");


Marshall Schor wrote:
  <blockquote type="cite">
    <pre wrap="">Did you consider using subIterators?  These are (briefly) described
section 4.7.4 of the Apache UIMA Reference book, and may include exactly
what you're trying to get at - an interator over elements that are
"contained" in the span of other elements.


Julien Nioche wrote:
    <blockquote type="cite">
      <pre wrap="">Hi,

Sorry if someone already asked the question.
Is there a direct way to obtain from a Cas all the annotations of a
given type located between two positions in the text? Something like
getContained(String type,int start,int end)?
I am trying to get all the Tokens contained within a specific
Sentence. I have used iterators for doing that and compared the offset
with those of the Sentence but it is a bit tedious. Have I missed
something obvious?




View raw message