uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <phi...@ogren.info>
Subject Re: Type Priorities
Date Thu, 13 Mar 2008 21:47:19 GMT
Katrin,

Yes.  There is a penalty for iterating through all the annotations of a 
given type.  Imagine you have a token annotation and a document with 10K 
tokens (not uncommon). 

We wrote a method that doesn't have this performance penalty and 
bypasses the type priorities. 

Please see:

http://cslr.colorado.edu/ClearTK/index.cgi/chrome/site/api/src-html/edu/colorado/cleartk/util/AnnotationRetrieval.html#line.237

Philip


Katrin Tomanek wrote:
> Hi Thilo,
>
>>> Actually, I am using the subiterator functionality and I need to 
>>> have type priorities to be set. But I don't want the user of a 
>>> component to be able to alter the type priorities by modifying the 
>>> descriptors where type priorities typically are set.
>>
>> is that necessary?  The user mustn't change the type system either,
>> else the annotator will no longer work.  Wouldn't you say it's part
>> of the contract that such metadata is not changed by the user?
> well, yeah. Could see it that way. But if I know at time of writing 
> the component which types it will use and how the priorities should 
> look like, why should I risk to let the user violate that "contract" ?
>
> But to come back to the true problem: I am asking about type 
> priorities because I am using the subiterator. For me, type priorities 
> are necessary when the types I am interested in are of exactly the 
> same offset as the type with which I constrain my subiterator. So, the 
> question for me is: shall I use the subiterator (and define priorities 
> in the compoennts descriptor) or shall I write my own function(see 
> below) doing what I want without the type priority problems:
>
> ----- SNIP --------
> // gives me a list of all abbreviations which are "within" my entity
> public ArrayList<Abbreviation> getAbbreviations(Entity entity, 
> JFSIndexRepository index) {
> ArrayList<Abbreviation> abbrev = new ArrayList<Abbreviation>();
> int StartOffset = entity.getBegin();
> int EndOffset = entity.getEnt();
> Iterator iter = index.getAnnotationIndex(Abbreviation.type).iterator();
> while (iter.hasNext()) {
>     Abbreviation currAbbrev = (Abbreviation) iter.next();
>     if (currAbbrev.getBegin() >= startOffset && 
> currAbbrev.getEnd()         <= endOffset) {
>         abbrev.add(currAbbrev);
>     }
> }
> return abbrev;
> }
> ----- SNIP --------
>
> Do you see an efficiency disadvantage when using the above function 
> instead of the subiterator?
>
> Katrin
>
>

Mime
View raw message