<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Thilo and Marshall,<br>
<br>
Thanks for sharing the tip. Indeed it would be a good idea to add this
little example to the documentation.<br>
<br>
A quick comment about the Iterator methods. I had a problem with the
following piece of code:<br>
<br>
<i>while (wordFormIterator.hasNext()){<br>
WordForm wf = (WordForm)wordFormIterator.next();<br>
if (wf.getBegin()==token.getBegin() &&
wf.getEnd()==token.getEnd()){<br>
liste.add(wf);<br>
}<br>
else {<br>
// move back<br>
wordFormIterator.moveToPrevious();<br>
return liste;<br>
}<br>
}<br>
</i><br>
The last element of the iterator was never accessible because <i>hasNext()</i>
returned false despite the fact that there WAS an element left in
there. <i>moveToPrevious </i>had been previously called on this
iterator.<br>
<br>
Should not <i>hasNext() </i>return true even if the cursor has been
moved forward or backward within the iterator? Or is the use of the
legacy methods (hasNext(), next()) incompatible with the <i>moveTo* </i>methods?<br>
<br>
Thanks<br>
<br>
Julien<br>
<blockquote cite="mid4695DC8D.6040306@gmx.de" type="cite">
<pre wrap="">To be a bit more explicit, here's some code that will determine how
many tokens the longest sentence in the document contains. It's a
silly example, but it illustrates the concept. Maybe this should go
in the docs. Note: I have not actually run this code, it may not
work immediately ;-)
CAS cas = ...;
Type sentenceType = cas.getTypeSystem().getType("yourSentenceTypeName");
Type tokenType = cas.getTypeSystem().getType("yourTokenTypeName");
FSIterator sentenceIt = cas.getAnnotationIndex(sentenceType).iterator();
AnnotationIndex tokenIndex = cas.getAnnotationIndex(tokenType);
FSIterator tokenIt;
int maxLen = 0;
int currentLen;
for (sentenceIt.moveToFirst(); sentenceIt.isValid(); sentenceIt.moveToNext()) {
tokenIt = tokenIndex.subiterator((AnnotationFS) sentenceIt.get());
currentLen = 0;
for (tokenIt.moveToFirst(); tokenIt.isValid(); tokenIt.moveToNext()) {
++currentLen;
}
maxLen = ((maxLen < currentLen) ? currentLen : maxLen);
}
System.out.println("Longest sentence contains " + maxLen + " tokens.");
--Thilo
Marshall Schor wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Did you consider using subIterators? These are (briefly) described
in
section 4.7.4 of the Apache UIMA Reference book, and may include exactly
what you're trying to get at - an interator over elements that are
"contained" in the span of other elements.
-Marshall
Julien Nioche wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi,
Sorry if someone already asked the question.
Is there a direct way to obtain from a Cas all the annotations of a
given type located between two positions in the text? Something like
getContained(String type,int start,int end)?
I am trying to get all the Tokens contained within a specific
Sentence. I have used iterators for doing that and compared the offset
with those of the Sentence but it is a bit tedious. Have I missed
something obvious?
Thanks
Julien
</pre>
</blockquote>
</blockquote>
</blockquote>
<br>
</body>
</html>
|