uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kline, Larry" <Larry.Kl...@mckesson.com>
Subject FilteredIterator is very slow
Date Mon, 31 Mar 2014 17:47:56 GMT
When I use a filtered FSIterator it's an order of magnitude slower than a non-filtered iterator.
 Here's my code:

Create the iterator:
       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) throws CASException
{
              FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
              FSTypeConstraint constraint = aJCas.getConstraintFactory().createTypeConstraint();
              constraint.add((new TitlePersonHonorificAnnotation(aJCas)).getType());
              constraint.add((new MeasurementAnnotation(aJCas)).getType());
              constraint.add((new ProgFactorTerm(aJCas)).getType());
              it = aJCas.createFilteredIterator(it, constraint);
              return it;
       }
Use the iterator:
       public void process(JCas aJCas) throws AnalysisEngineProcessException {
              ...
// The following is done in a loop
                           if (shouldSkip(dictTerm, skipIter))
                                  continue;
              ...
       }
Here's the method called:
       private boolean shouldSkip(G2DictTerm dictTerm, FSIterator<Annotation> skipIter)
throws CASException {
              boolean shouldSkip = false;
              skipIter.moveToFirst();
              while (skipIter.hasNext()) {
                     Annotation annotation = skipIter.next();
                     if (UIMAUtils.annotationsOverlap(dictTerm, annotation)) {
                           shouldSkip = true;
                           break;
                     }
              }
              return shouldSkip;
       }

If I change the method, createConstrainedIterator(), to this (that is, no constraints):
       private FSIterator<Annotation> createConstrainedIterator(JCas aJCas) throws CASException
{
              FSIterator<Annotation> it = aJCas.getAnnotationIndex().iterator();
              return it;
       }

It runs literally 10 times faster.  Doing some profiling I see that all of the time is spent
in the skipIter.moveToFirst() call.  I also tried creating the filtered iterator each time
anew in the shouldSkip() method instead of passing it in, but that has even slightly worse
performance.

Given this performance I suppose I should probably use a non-filtered iterator and just check
for the types I'm interested in inside the loop.

Any other suggestions welcome.

Thanks,
Larry Kline



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message