Katrin,
Yes. There is a penalty for iterating through all the annotations of a
given type. Imagine you have a token annotation and a document with 10K
tokens (not uncommon).
We wrote a method that doesn't have this performance penalty and
bypasses the type priorities.
Please see:
http://cslr.colorado.edu/ClearTK/index.cgi/chrome/site/api/src-html/edu/colorado/cleartk/util/AnnotationRetrieval.html#line.237
Philip
Katrin Tomanek wrote:
> Hi Thilo,
>
>>> Actually, I am using the subiterator functionality and I need to
>>> have type priorities to be set. But I don't want the user of a
>>> component to be able to alter the type priorities by modifying the
>>> descriptors where type priorities typically are set.
>>
>> is that necessary? The user mustn't change the type system either,
>> else the annotator will no longer work. Wouldn't you say it's part
>> of the contract that such metadata is not changed by the user?
> well, yeah. Could see it that way. But if I know at time of writing
> the component which types it will use and how the priorities should
> look like, why should I risk to let the user violate that "contract" ?
>
> But to come back to the true problem: I am asking about type
> priorities because I am using the subiterator. For me, type priorities
> are necessary when the types I am interested in are of exactly the
> same offset as the type with which I constrain my subiterator. So, the
> question for me is: shall I use the subiterator (and define priorities
> in the compoennts descriptor) or shall I write my own function(see
> below) doing what I want without the type priority problems:
>
> ----- SNIP --------
> // gives me a list of all abbreviations which are "within" my entity
> public ArrayList<Abbreviation> getAbbreviations(Entity entity,
> JFSIndexRepository index) {
> ArrayList<Abbreviation> abbrev = new ArrayList<Abbreviation>();
> int StartOffset = entity.getBegin();
> int EndOffset = entity.getEnt();
> Iterator iter = index.getAnnotationIndex(Abbreviation.type).iterator();
> while (iter.hasNext()) {
> Abbreviation currAbbrev = (Abbreviation) iter.next();
> if (currAbbrev.getBegin() >= startOffset &&
> currAbbrev.getEnd() <= endOffset) {
> abbrev.add(currAbbrev);
> }
> }
> return abbrev;
> }
> ----- SNIP --------
>
> Do you see an efficiency disadvantage when using the above function
> instead of the subiterator?
>
> Katrin
>
>
|