ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim Ebert <kim.eb...@perfectsearchcorp.com>
Subject Re: cTakes Annotation Comparison
Date Fri, 19 Dec 2014 22:14:13 GMT
Bruce,

I think we all feel a lot better now. I think the tool will be helpful
moving forward.

I've updated the git repo with the fix in case anyone is interested.

IMAT Solutions <http://imatsolutions.com>
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.ebert@imatsolutions.com <mailto:greg.hubert@imatsolutions.com>
On 12/19/2014 03:04 PM, Bruce Tietjen wrote:
> My apologies to Sean and everyone,
>
> I am happy to report that I found a bug in our analysis tools that was
> missing the last FSArray entry for any FSArray list.
>
> With the bug fixed, the results look MUCH better.
>
> UMLSProcessor found 31,598 annotations
> FastUMLSProcessor found 30,716 annotations
>
> There were 23,522 annotations that were exact matches between the two.
>
> When comparing with the gold standard annotations (4591 annotations):
>
> UMLSProcessor found 2632 matches (2,735 including overlaps)
> FastUMLSProcessor found 2795 matches (2,842 including overlaps)
>
>
>
>
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tietjen@imatsolutions.com
>
> On Fri, Dec 19, 2014 at 1:49 PM, Bruce Tietjen <
> bruce.tietjen@perfectsearchcorp.com> wrote:
>> I'll do that -- there is always a possibility of bugs in the analysis
>> tool.
>>
>>
>>  [image: IMAT Solutions] <http://imatsolutions.com>
>>  Bruce Tietjen
>> Senior Software Engineer
>> [image: Mobile:] 801.634.1547
>> bruce.tietjen@imatsolutions.com
>>
>> On Fri, Dec 19, 2014 at 1:39 PM, Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>>>  Sorry, I meant “Do some spot checks on the validity”.  In other words,
>>> when your script reports that a cui and/or span is missing, manually look
>>> at the data and see if it really is.  Just open up one .xmi in the CVD and
>>> see what it looks like.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Sean
>>>
>>>
>>>
>>> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
>>> *Sent:* Friday, December 19, 2014 3:37 PM
>>> *To:* dev@ctakes.apache.org
>>> *Subject:* Re: cTakes Annotation Comparison
>>>
>>>
>>>
>>> My original results were using a newly downloaded cTakes 3.2.1 with the
>>> separately downloaded resources copied in. There were no changes to any of
>>> the configuration files.
>>>
>>> As far as this last run, I modified the UMLSLookupAnnotator.xml and
>>> AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
>>> used (but they may not get through the mailing list).
>>>
>>>
>>>
>>>
>>>
>>>
>>> [image: Image removed by sender. IMAT Solutions]
>>> <http://imatsolutions.com>
>>>
>>> *Bruce Tietjen*
>>> Senior Software Engineer
>>> [image: Image removed by sender. Mobile:]801.634.1547
>>> bruce.tietjen@imatsolutions.com
>>>
>>>
>>>
>>> On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean <
>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>
>>> Hi Bruce,
>>>
>>> I'm not sure how there would be fewer matches with the overlap
>>> processor.  There should be all of the matches from the non-overlap
>>> processor plus those from the overlap.  Decreasing from 215 to 211 is
>>> strange.  Have you done any manual spot checks on this?  It is really
>>> bizarre that you'd only have two matches per document (100 docs?).
>>>
>>> Thanks,
>>> Sean
>>>
>>> -----Original Message-----
>>> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
>>> Sent: Friday, December 19, 2014 3:23 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: cTakes Annotation Comparison
>>>
>>> Sean,
>>>
>>> I tried the configuration changes you mentioned in your earlier email.
>>>
>>> The results are as follows:
>>>
>>> Total Annotations found: 12,161 (default configuration found 8,284)
>>>
>>> If counting exact span matches, this run only matched 211 (default
>>> configuration matched 215).
>>>
>>> If counting overlapping spans, this run only matched 220 (default
>>> configuration matched 224)
>>>
>>> Bruce
>>>
>>>
>>>
>>>  [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen
>>> Senior Software Engineer
>>> [image: Mobile:] 801.634.1547
>>> bruce.tietjen@imatsolutions.com
>>>
>>> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei <
>>> Pei.Chen@childrens.harvard.edu>
>>> wrote:
>>>>  Kim,
>>>>
>>>> Maintenance is the factor not bugs/issue to forge ahead.
>>>>
>>>> They are 2 components that do the same thing with the same goal (As
>>>> Sean mentioned, one should be able configure the new code base to
>>>> replicate the old algorithm if required- it’s just a simpler and
>>>> cleaner code base.  If this is not the case or if there are issues, we
>>>> should fix it and move forward.).
>>>>
>>>> We can keep the old component around for as long as needed, but it’s
>>>> likely going to have limited support…
>>>>
>>>> --Pei
>>>>
>>>>
>>>>
>>>> *From:* Kim Ebert [mailto:kim.ebert@imatsolutions.com]
>>>> *Sent:* Friday, December 19, 2014 1:47 PM
>>>> *To:* Chen, Pei; dev@ctakes.apache.org
>>>>
>>>> *Subject:* Re: cTakes Annotation Comparison
>>>>
>>>>
>>>>
>>>> Pei,
>>>>
>>>> I don't think bugs/issues should be part of determining if one
>>>> algorithm vs the other is superior. Obviously, it is worth mentioning
>>>> the bugs, but if the fast lookup method has worse precision and recall
>>>> but better performance, vs the slower but more accurate first word
>>>> lookup algorithm, then time should be invested in fixing those bugs
>>>> and resolving those weird issues.
>>>>
>>>> Now I'm not saying which one is superior in this case, as the data
>>>> will end up speaking for itself one way or the other; bus as of right
>>>> now, I'm not convinced yet that the old dictionary lookup is obsolete
>>>> yet, and I'm not sure the community is convinced yet either.
>>>>
>>>>
>>>>
>>>> [image: IMAT Solutions] <http://imatsolutions.com>
>>>>
>>>> *Kim Ebert*
>>>> Software Engineer
>>>> [image: Office:]801.669.7342
>>>> kim.ebert@imatsolutions.com <greg.hubert@imatsolutions.com>
>>>>
>>>> On 12/19/2014 08:39 AM, Chen, Pei wrote:
>>>>
>>>> Also check out stats that Sean ran before releasing the new component
>>> on:
>>>>
>>>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
>>>> fast/doc/DictionaryLookupStats.docx
>>>>
>>>> From the evaluation and experience, the new lookup algorithm should be
>>>> a huge improvement in terms of both speed and accuracy.
>>>>
>>>> This is very different than what Bruce mentioned…  I’m sure Sean will
>>>> chime here.
>>>>
>>>> (The old dictionary lookup is essentially obsolete now- plagued with
>>>> bugs/issues as you mentioned.)
>>>>
>>>> --Pei
>>>>
>>>>
>>>>
>>>> *From:* Kim Ebert [mailto:kim.ebert@perfectsearchcorp.com
>>>> <kim.ebert@perfectsearchcorp.com>]
>>>> *Sent:* Friday, December 19, 2014 10:25 AM
>>>> *To:* dev@ctakes.apache.org
>>>> *Subject:* Re: cTakes Annotation Comparison
>>>>
>>>>
>>>>
>>>> Guergana,
>>>>
>>>> I'm curious to the number of records that are in your gold standard
>>>> sets, or if your gold standard set was run through a long running
>>> cTAKES process.
>>>> I know at some point we fixed a bug in the old dictionary lookup that
>>>> caused the permutations to become corrupted over time. Typically this
>>>> isn't seen in the first few records, but over time as patterns are
>>>> used the permutations would become corrupted. This caused documents
>>>> that were fed through cTAKES more than once to have less codes
>>>> returned than the first time.
>>>>
>>>> For example, if a permutation of 4,2,3,1 was found, the permutation
>>>> would be corrupted to be 1,2,3,4. It would no longer be possible to
>>>> detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
>>>> fix in after the cTAKES 3.2.0 release.
>>>> https://issues.apache.org/jira/browse/CTAKES-310
>>>> Depending upon the corpus size, I could see the permutation engine
>>>> eventually only have a single permutation of 1,2,3,4.
>>>>
>>>> Typically though, this isn't very easily detected in the first 100 or
>>>> so documents.
>>>>
>>>> We discovered this issue when we made cTAKES have consistent output of
>>>> codes in our system.
>>>>
>>>>
>>>>
>>>> [image: IMAT Solutions] <http://imatsolutions.com>
>>>>
>>>> *Kim Ebert*
>>>> Software Engineer
>>>> [image: Office:]801.669.7342
>>>> kim.ebert@imatsolutions.com <greg.hubert@imatsolutions.com>
>>>> On 12/19/2014 07:05 AM, Savova, Guergana wrote:
>>>>
>>>> We are doing a similar kind of evaluation and will report the results.
>>>>
>>>>
>>>>
>>>> Before we released the Fast lookup, we did a systematic evaluation
>>> across three gold standard sets. We did not see the trend that Bruce
>>> reported below. The P, R and F1 results from the old dictionary look up and
>>> the fast one were similar.
>>>>
>>>>
>>>> Thank you everyone!
>>>>
>>>> --Guergana
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: David Kincaid [mailto:kincaid.dave@gmail.com
>>>> <kincaid.dave@gmail.com>]
>>>>
>>>> Sent: Friday, December 19, 2014 9:02 AM
>>>>
>>>> To: dev@ctakes.apache.org
>>>>
>>>> Subject: Re: cTakes Annotation Comparison
>>>>
>>>>
>>>>
>>>> Thanks for this, Bruce! Very interesting work. It confirms what I've
>>> seen in my small tests that I've done in a non-systematic way. Did you
>>> happen to capture the number of false positives yet (annotations made by
>>> cTAKES that are not in the human adjudicated standard)? I've seen a lot of
>>> dictionary hits that are not actually entity mentions, but I haven't had a
>>> chance to do a systematic analysis (we're working on our annotated gold
>>> standard now). One great example is the antibiotic "Today". Every time the
>>> word today appears in any text it is annotated as a medication mention when
>>> it almost never is being used in that sense.
>>>>
>>>>
>>>> These results by themselves are quite disappointing to me. Both the
>>> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor
>>> recall. It seems like the trade off for more speed is a ten-fold (or more)
>>> decrease in entity recognition.
>>>>
>>>>
>>>> Thanks again for sharing your results with us. I think they are very
>>> useful to the project.
>>>>
>>>>
>>>> - Dave
>>>>
>>>>
>>>>
>>>> On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen <
>>> bruce.tietjen@perfectsearchcorp.com> wrote:
>>>>
>>>>
>>>> Actually, we are working on a similar tool to compare it to the human
>>>>
>>>> adjudicated standard for the set we tested against.  I didn't mention
>>>>
>>>> it before because the tool isn't complete yet, but initial results for
>>>>
>>>> the set (excluding those marked as "CUI-less") was as follows:
>>>>
>>>>
>>>>
>>>> Human adjudicated annotations: 4591 (excluding CUI-less)
>>>>
>>>>
>>>>
>>>> Annotations found matching the human adjudicated standard
>>>>
>>>> UMLSProcessor                  2245
>>>>
>>>> FastUMLSProcessor           215
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  [image: IMAT Solutions] <http://imatsolutions.com>
>>>> <http://imatsolutions.com>  Bruce Tietjen
>>>>
>>>> Senior Software Engineer
>>>>
>>>> [image: Mobile:] 801.634.1547
>>>>
>>>> bruce.tietjen@imatsolutions.com
>>>>
>>>>
>>>>
>>>> On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
>>>>
>>>> <Pei.Chen@childrens.harvard.edu
>>>>
>>>>
>>>>
>>>>  wrote:
>>>>
>>>>
>>>>
>>>> Bruce,
>>>>
>>>> Thanks for this-- very useful.
>>>>
>>>> Perhaps Sean Finan comment more-
>>>>
>>>> but it's also probably worth it to compare to an adjudicated human
>>>>
>>>> annotated gold standard.
>>>>
>>>>
>>>>
>>>> --Pei
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com
>>>> <bruce.tietjen@perfectsearchcorp.com>]
>>>>
>>>> Sent: Thursday, December 18, 2014 1:45 PM
>>>>
>>>> To: dev@ctakes.apache.org
>>>>
>>>> Subject: cTakes Annotation Comparison
>>>>
>>>>
>>>>
>>>> With the recent release of cTakes 3.2.1, we were very interested in
>>>>
>>>> checking for any differences in annotations between using the
>>>>
>>>> AggregatePlaintextUMLSProcessor pipeline and the
>>>>
>>>> AggregatePlanetextFastUMLSProcessor pipeline within this release of
>>>>
>>>>  cTakes
>>>>
>>>>  with its associated set of UMLS resources.
>>>>
>>>>
>>>>
>>>> We chose to use the SHARE 14-a-b Training data that consists of 199
>>>>
>>>> documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
>>>>
>>>> basis for the comparison.
>>>>
>>>>
>>>>
>>>> We decided to share a summary of the results with the development
>>>>
>>>> community.
>>>>
>>>>
>>>>
>>>> Documents Processed: 199
>>>>
>>>>
>>>>
>>>> Processing Time:
>>>>
>>>> UMLSProcessor           2,439 seconds
>>>>
>>>> FastUMLSProcessor    1,837 seconds
>>>>
>>>>
>>>>
>>>> Total Annotations Reported:
>>>>
>>>> UMLSProcessor                  20,365 annotations
>>>>
>>>> FastUMLSProcessor             8,284 annotations
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Annotation Comparisons:
>>>>
>>>> Annotations common to both sets:                                  3,940
>>>>
>>>> Annotations reported only by the UMLSProcessor:         16,425
>>>>
>>>> Annotations reported only by the FastUMLSProcessor:    4,344
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> If anyone is interested, following was our test procedure:
>>>>
>>>>
>>>>
>>>> We used the UIMA CPE to process the document set twice, once using
>>>>
>>>> the AggregatePlaintextUMLSProcessor pipeline and once using the
>>>>
>>>> AggregatePlaintextFastUMLSProcessor pipeline. We used the
>>>>
>>>> WriteCAStoFile CAS consumer to write the results to output files.
>>>>
>>>>
>>>>
>>>> We used a tool we recently developed to analyze and compare the
>>>>
>>>> annotations generated by the two pipelines. The tool compares the
>>>>
>>>> two outputs for each file and reports any differences in the
>>>>
>>>> annotations (MedicationMention, SignSymptomMention,
>>>>
>>>> ProcedureMention, AnatomicalSiteMention, and
>>>>
>>>> DiseaseDisorderMention) between the two output sets. The tool
>>>>
>>>> reports the number of 'matches' and 'misses' between each annotation
>>> set. A 'match'
>>>>  is
>>>>
>>>>  defined as the presence of an identified source text interval with
>>>>
>>>> its associated CUI appearing in both annotation sets. A 'miss' is
>>>>
>>>> defined as the presence of an identified source text interval and
>>>>
>>>> its associated CUI in one annotation set, but no matching identified
>>>>
>>>> source text interval
>>>>
>>>>  and
>>>>
>>>>  CUI in the other. The tool also reports the total number of
>>>>
>>>> annotations (source text intervals with associated CUIs) reported in
>>>>
>>>> each annotation set. The compare tool is in our GitHub repository at
>>>>
>>>> https://github.com/perfectsearch/cTAKES-compare
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


Mime
View raw message