ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Green" <john.travis.gr...@gmail.com>
Subject Re: cTakes Annotation Comparison
Date Fri, 19 Dec 2014 11:50:05 GMT
Wow, great work. Thank you for sharing. 


John Green
—
Sent from Mailbox

On Thu, Dec 18, 2014 at 6:08 PM, Bruce Tietjen
<bruce.tietjen@perfectsearchcorp.com> wrote:

> Actually, we are working on a similar tool to compare it to the human
> adjudicated standard for the set we tested against.  I didn't mention it
> before because the tool isn't complete yet, but initial results for the set
> (excluding those marked as "CUI-less") was as follows:
> Human adjudicated annotations: 4591 (excluding CUI-less)
> Annotations found matching the human adjudicated standard
> UMLSProcessor                  2245
> FastUMLSProcessor           215
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tietjen@imatsolutions.com
> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei <Pei.Chen@childrens.harvard.edu>
> wrote:
>>
>> Bruce,
>> Thanks for this-- very useful.
>> Perhaps Sean Finan comment more-
>> but it's also probably worth it to compare to an adjudicated human
>> annotated gold standard.
>>
>> --Pei
>>
>> -----Original Message-----
>> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
>> Sent: Thursday, December 18, 2014 1:45 PM
>> To: dev@ctakes.apache.org
>> Subject: cTakes Annotation Comparison
>>
>> With the recent release of cTakes 3.2.1, we were very interested in
>> checking for any differences in annotations between using the
>> AggregatePlaintextUMLSProcessor pipeline and the
>> AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes
>> with its associated set of UMLS resources.
>>
>> We chose to use the SHARE 14-a-b Training data that consists of 199
>> documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the basis
>> for the comparison.
>>
>> We decided to share a summary of the results with the development
>> community.
>>
>> Documents Processed: 199
>>
>> Processing Time:
>> UMLSProcessor           2,439 seconds
>> FastUMLSProcessor    1,837 seconds
>>
>> Total Annotations Reported:
>> UMLSProcessor                  20,365 annotations
>> FastUMLSProcessor             8,284 annotations
>>
>>
>> Annotation Comparisons:
>> Annotations common to both sets:                                  3,940
>> Annotations reported only by the UMLSProcessor:         16,425
>> Annotations reported only by the FastUMLSProcessor:    4,344
>>
>>
>> If anyone is interested, following was our test procedure:
>>
>> We used the UIMA CPE to process the document set twice, once using the
>> AggregatePlaintextUMLSProcessor pipeline and once using the
>> AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile
>> CAS consumer to write the results to output files.
>>
>> We used a tool we recently developed to analyze and compare the
>> annotations generated by the two pipelines. The tool compares the two
>> outputs for each file and reports any differences in the annotations
>> (MedicationMention, SignSymptomMention, ProcedureMention,
>> AnatomicalSiteMention, and
>> DiseaseDisorderMention) between the two output sets. The tool reports the
>> number of 'matches' and 'misses' between each annotation set. A 'match' is
>> defined as the presence of an identified source text interval with its
>> associated CUI appearing in both annotation sets. A 'miss' is defined as
>> the presence of an identified source text interval and its associated CUI
>> in one annotation set, but no matching identified source text interval and
>> CUI in the other. The tool also reports the total number of annotations
>> (source text intervals with associated CUIs) reported in each annotation
>> set. The compare tool is in our GitHub repository at
>> https://github.com/perfectsearch/cTAKES-compare
>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message