uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <J.Nio...@dcs.shef.ac.uk>
Subject Iterators: problem when using standard methods in combination with moveTo*
Date Thu, 12 Jul 2007 11:00:01 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Thilo and Marshall,<br>
<br>
Thanks for sharing the tip. Indeed it would be a good idea to add this
little example to the documentation.<br>
<br>
A quick comment about the Iterator methods. I had a problem with the
following piece of code:<br>
<br>
<i>while (wordFormIterator.hasNext()){<br>
WordForm wf = (WordForm)wordFormIterator.next();<br>
if (wf.getBegin()==token.getBegin() &amp;&amp;
wf.getEnd()==token.getEnd()){<br>
liste.add(wf);<br>
}<br>
else {<br>
//&nbsp; move back<br>
wordFormIterator.moveToPrevious();<br>
&nbsp;return liste;<br>
&nbsp;}<br>
}<br>
</i><br>
The last element of the iterator was never accessible because <i>hasNext()</i>
returned false despite the fact that there WAS an element left in
there. <i>moveToPrevious </i>had been previously called on this
iterator.<br>
<br>
Should not <i>hasNext() </i>return true even if the cursor has been
moved forward or backward within the iterator? Or is the use of the
legacy methods (hasNext(), next()) incompatible with the <i>moveTo* </i>methods?<br>
<br>
Thanks<br>
<br>
Julien<br>
<blockquote cite="mid4695DC8D.6040306@gmx.de" type="cite">
  <pre wrap="">To be a bit more explicit, here's some code that will determine how
many tokens the longest sentence in the document contains.  It's a
silly example, but it illustrates the concept.  Maybe this should go
in the docs.  Note: I have not actually run this code, it may not
work immediately ;-)

    CAS cas = ...;
    Type sentenceType = cas.getTypeSystem().getType("yourSentenceTypeName");
    Type tokenType = cas.getTypeSystem().getType("yourTokenTypeName");
    FSIterator sentenceIt = cas.getAnnotationIndex(sentenceType).iterator();
    AnnotationIndex tokenIndex = cas.getAnnotationIndex(tokenType);
    FSIterator tokenIt;
    int maxLen = 0;
    int currentLen;
    for (sentenceIt.moveToFirst(); sentenceIt.isValid(); sentenceIt.moveToNext()) {
      tokenIt = tokenIndex.subiterator((AnnotationFS) sentenceIt.get());
      currentLen = 0;
      for (tokenIt.moveToFirst(); tokenIt.isValid(); tokenIt.moveToNext()) {
	++currentLen;
      }
      maxLen = ((maxLen &lt; currentLen) ? currentLen : maxLen);
    }
    System.out.println("Longest sentence contains " + maxLen + " tokens.");

--Thilo

Marshall Schor wrote:
  </pre>
  <blockquote type="cite">
    <pre wrap="">Did you consider using subIterators?  These are (briefly) described
in
section 4.7.4 of the Apache UIMA Reference book, and may include exactly
what you're trying to get at - an interator over elements that are
"contained" in the span of other elements.

-Marshall

Julien Nioche wrote:
    </pre>
    <blockquote type="cite">
      <pre wrap="">Hi,

Sorry if someone already asked the question.
Is there a direct way to obtain from a Cas all the annotations of a
given type located between two positions in the text? Something like
getContained(String type,int start,int end)?
I am trying to get all the Tokens contained within a specific
Sentence. I have used iterators for doing that and compared the offset
with those of the Sentence but it is a bit tedious. Have I missed
something obvious?

Thanks

Julien


      </pre>
    </blockquote>
  </blockquote>
</blockquote>
<br>
</body>
</html>

Mime
View raw message