uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Ginter <thomas.gin...@utah.edu>
Subject Re: FilteredIterator is very slow
Date Mon, 31 Mar 2014 19:56:23 GMT
Larry,

A faster way to get the list of types that you will skip would be to do the following:

FSIndex<TitlePersonHonorificAnnotation> titlePersonHAIndex = aJCas.getAnnotationIndex(TitlePersonHonorificAnnotation.type);

Doing this for each type will yield an index that points to just the annotations in the CAS
of each type you are interested in.  From there you can get an iterator reference ( titlePersonHAIndex.iterator()
) and either traverse each one separately or else add them to a common Collection such as
an ArrayList and iterate through that.  You could also take advantage of the fact that the
default index in UIMA sorts on ascending order on the begin index and descending order on
the ending index to stop once you have traversed the list past the ending index of the dictTerm.
 

An important design decision though would be to consider whether the dictTerm annotations
are much more numerous than the TitlePersonHonorificAnnotation, MeasurementAnnotation, and
ProgFactorTerm filtering annotation types.  Generally if the filter types are much more plentiful
and the dictTerm type was more rare then looking for overlapping filter types will yield fewer
iterations of your algorithm, however if there are a lot of dictTerm occurrences and only
a few of the filter types then it may be more efficient to iterate through the filter types
and eliminate dictTerms that overlap or are covered.  

Thanks,

Thomas Ginter
801-448-7676
thomas.ginter@utah.edu




On Mar 31, 2014, at 11:47 AM, Kline, Larry <Larry.Kline@mckesson.com> wrote:

> When I use a filtered FSIterator it's an order of magnitude slower than a non-filtered
iterator.  Here's my code:
> 
> Create the iterator:
>       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) throws
CASException {
>              FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
>              FSTypeConstraint constraint = aJCas.getConstraintFactory().createTypeConstraint();
>              constraint.add((new TitlePersonHonorificAnnotation(aJCas)).getType());
>              constraint.add((new MeasurementAnnotation(aJCas)).getType());
>              constraint.add((new ProgFactorTerm(aJCas)).getType());
>              it = aJCas.createFilteredIterator(it, constraint);
>              return it;
>       }
> Use the iterator:
>       public void process(JCas aJCas) throws AnalysisEngineProcessException {
>              ...
> // The following is done in a loop
>                           if (shouldSkip(dictTerm, skipIter))
>                                  continue;
>              ...
>       }
> Here's the method called:
>       private boolean shouldSkip(G2DictTerm dictTerm, FSIterator<Annotation> skipIter)
throws CASException {
>              boolean shouldSkip = false;
>              skipIter.moveToFirst();
>              while (skipIter.hasNext()) {
>                     Annotation annotation = skipIter.next();
>                     if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
>                           shouldSkip = true;
>                           break;
>                     }
>              }
>              return shouldSkip;
>       }
> 
> If I change the method, createConstrainedIterator(), to this (that is, no constraints):
>       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) throws
CASException {
>              FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
>              return it;
>       }
> 
> It runs literally 10 times faster.  Doing some profiling I see that all of the time is
spent in the skipIter.moveToFirst() call.  I also tried creating the filtered iterator each
time anew in the shouldSkip() method instead of passing it in, but that has even slightly
worse performance.
> 
> Given this performance I suppose I should probably use a non-filtered iterator and just
check for the types I'm interested in inside the loop.
> 
> Any other suggestions welcome.
> 
> Thanks,
> Larry Kline
> 
> 


Mime
View raw message