uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Type Priorities
Date Thu, 13 Mar 2008 21:30:31 GMT
Katrin Tomanek wrote:
> Hi Thilo,
> 
>>> Actually, I am using the subiterator functionality and I need to have 
>>> type priorities to be set. But I don't want the user of a component 
>>> to be able to alter the type priorities by modifying the descriptors 
>>> where type priorities typically are set.
>>
>> is that necessary?  The user mustn't change the type system either,
>> else the annotator will no longer work.  Wouldn't you say it's part
>> of the contract that such metadata is not changed by the user?
> well, yeah. Could see it that way. But if I know at time of writing the 
> component which types it will use and how the priorities should look 
> like, why should I risk to let the user violate that "contract" ?

One reason is that you want to publish these requirements.  If you
don't declare the necessary type priorities, somebody else might just
define other type priorities, and then one annotator or the other is
going to fail.

Now quite frankly, I don't know what will happen if two annotators
declare conflicting type priorities in their descriptor.  One would
hope that UIMA will raise an error, just like when conflicting types
are defined.  Anyway, only if you declare in your descriptor what
type priorities you expect is there a chance to resolve any conflicts.

The method mentioned by Steven, btw, does not work from inside an
annotator.  You can only do that before you actually construct the
analysis engine, in the application that loads the descriptor.  Once
the AE has been built, it can not be modified like this.

Having said all that, for the issue you're describing I would go
with custom code.  See comment below.

> 
> But to come back to the true problem: I am asking about type priorities 
> because I am using the subiterator. For me, type priorities are 
> necessary when the types I am interested in are of exactly the same 
> offset as the type with which I constrain my subiterator. So, the 
> question for me is: shall I use the subiterator (and define priorities 
> in the compoennts descriptor) or shall I write my own function(see 
> below) doing what I want without the type priority problems:
> 
> ----- SNIP --------
> // gives me a list of all abbreviations which are "within" my entity
> public ArrayList<Abbreviation> getAbbreviations(Entity entity, 
> JFSIndexRepository index) {
> ArrayList<Abbreviation> abbrev = new ArrayList<Abbreviation>();
> int StartOffset = entity.getBegin();
> int EndOffset = entity.getEnt();
> Iterator iter = index.getAnnotationIndex(Abbreviation.type).iterator();
> while (iter.hasNext()) {
>     Abbreviation currAbbrev = (Abbreviation) iter.next();
>     if (currAbbrev.getBegin() >= startOffset && 
> currAbbrev.getEnd()         <= endOffset) {
>         abbrev.add(currAbbrev);
>     }
> }
> return abbrev;
> }
> ----- SNIP --------
> 
> Do you see an efficiency disadvantage when using the above function 
> instead of the subiterator?

This can be done slightly more efficiently.  I'm just typing this in
from memory, so it will probably not work off the bat.

// gives me a list of all abbreviations which are "within" my entity
public ArrayList<Abbreviation> getAbbreviations(Entity entity, JFSIndexRepository index)
{
ArrayList<Abbreviation> abbrev = new ArrayList<Abbreviation>();
int StartOffset = entity.getBegin();
int EndOffset = entity.getEnt();
// Get the CAS from somewhere; create temporary abbrev to position iterator
Abbreviation tmpAbbrev = new Abbreviation(cas, startOffset, endOffset);
Iterator iter = index.getAnnotationIndex(Abbreviation.type).iterator();
// Move the iterator to the right position
iter.moveTo(tmpAbbrev);
while (iter.hasNext()) {
     Abbreviation currAbbrev = (Abbreviation) iter.next();
     // Don't need to check start position, this is guaranteed by iterator positioning
     if (currAbbrev.getEnd()         <= endOffset) {
         abbrev.add(currAbbrev);
     }
     // Terminate early;  nothing to be found after end of entity boundaries
     if (currAbbrev.getStart() > endOffset) {
       break;
     }
}
return abbrev;
}

Untested, use with caution ;-)

--Thilo


> 
> Katrin


Mime
View raw message