uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <phi...@ogren.info>
Subject Re: Type Priorities
Date Thu, 13 Mar 2008 21:47:19 GMT

Yes.  There is a penalty for iterating through all the annotations of a 
given type.  Imagine you have a token annotation and a document with 10K 
tokens (not uncommon). 

We wrote a method that doesn't have this performance penalty and 
bypasses the type priorities. 

Please see:



Katrin Tomanek wrote:
> Hi Thilo,
>>> Actually, I am using the subiterator functionality and I need to 
>>> have type priorities to be set. But I don't want the user of a 
>>> component to be able to alter the type priorities by modifying the 
>>> descriptors where type priorities typically are set.
>> is that necessary?  The user mustn't change the type system either,
>> else the annotator will no longer work.  Wouldn't you say it's part
>> of the contract that such metadata is not changed by the user?
> well, yeah. Could see it that way. But if I know at time of writing 
> the component which types it will use and how the priorities should 
> look like, why should I risk to let the user violate that "contract" ?
> But to come back to the true problem: I am asking about type 
> priorities because I am using the subiterator. For me, type priorities 
> are necessary when the types I am interested in are of exactly the 
> same offset as the type with which I constrain my subiterator. So, the 
> question for me is: shall I use the subiterator (and define priorities 
> in the compoennts descriptor) or shall I write my own function(see 
> below) doing what I want without the type priority problems:
> ----- SNIP --------
> // gives me a list of all abbreviations which are "within" my entity
> public ArrayList<Abbreviation> getAbbreviations(Entity entity, 
> JFSIndexRepository index) {
> ArrayList<Abbreviation> abbrev = new ArrayList<Abbreviation>();
> int StartOffset = entity.getBegin();
> int EndOffset = entity.getEnt();
> Iterator iter = index.getAnnotationIndex(Abbreviation.type).iterator();
> while (iter.hasNext()) {
>     Abbreviation currAbbrev = (Abbreviation) iter.next();
>     if (currAbbrev.getBegin() >= startOffset && 
> currAbbrev.getEnd()         <= endOffset) {
>         abbrev.add(currAbbrev);
>     }
> }
> return abbrev;
> }
> ----- SNIP --------
> Do you see an efficiency disadvantage when using the above function 
> instead of the subiterator?
> Katrin

View raw message