ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Finan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-449) PolarityCleartkAnalysisEngine slow for large documents
Date Mon, 10 Jul 2017 13:24:00 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080318#comment-16080318

Sean Finan commented on CTAKES-449:

See the devlist emails.  The root of the issue has been identified, but it needs to be fixed.

> PolarityCleartkAnalysisEngine slow for large documents
> ------------------------------------------------------
>                 Key: CTAKES-449
>                 URL: https://issues.apache.org/jira/browse/CTAKES-449
>             Project: cTAKES
>          Issue Type: Improvement
>          Components: ctakes-assertion
>            Reporter: Dmitriy Dligach
> As soon as I add at the end of my pipeline the negation AE:
> aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() );
> The pipeline becomes 50-100 times slower. This likely has to do with the line:
> List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class,
entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));
> in AssertionCleartkAnalysisEngine. I am running the pipeline on large files (i.e. having
a large number of sentences). The slowdown is caused by the code's obtaining all sentences
in a document for each identified annotation.
> The full pipeline is here:
> https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java

This message was sent by Atlassian JIRA

View raw message