ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Dligach (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-449) PolarityCleartkAnalysisEngine slow for large documents
Date Fri, 30 Jun 2017 15:00:02 GMT
Dmitriy Dligach created CTAKES-449:
--------------------------------------

             Summary: PolarityCleartkAnalysisEngine slow for large documents
                 Key: CTAKES-449
                 URL: https://issues.apache.org/jira/browse/CTAKES-449
             Project: cTAKES
          Issue Type: Improvement
          Components: ctakes-assertion
            Reporter: Dmitriy Dligach


As soon as I add at the end of my pipeline the negation AE:
aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() );

The pipeline becomes 50-100 times slower. This likely has to do with the line:
List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class,
entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

in AssertionCleartkAnalysisEngine. I am running the pipeline on large files (i.e. having a
large number of sentences). The slowdown is caused by the code's obtaining all sentences in
a document for each identified annotation.

The full pipeline is here:
https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message