ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: Differences in MedicationMention annotations on subsequent processing runs
Date Thu, 09 Oct 2014 17:42:27 GMT
I just ran the –fast with an example containing  bacitracin in four sentences, once being
the first word and once being the last.  In ten of ten runs all four bacitracin mentions were

You completely replaced the dictionary lookup with ?
    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
      <import location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>

From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
Sent: Thursday, October 09, 2014 11:42 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

I tried the Dictionary-lookup-fast module and the bahavior is the same. I did have to run
it a number of times before timing was right to reproduce the issue. With the older lookup,
chances were about 50/50 between which dictionary ran first. Using the dictionary-fast, it
seems more like 70/30 with the standard umls lookup being more likely to run first than not.
Which means that most of the time, there is no MedicationMention annotation for Bacitracin.
 (See Attached)
The code with the issue is the DictionaryLookupAnnotator which is a container for the dictionaries
and it iterates through the list of lookup dictionaries so that part of the code path does
not seem to have changed.
In the past, the rxNorm dictionary was a Lucene search and so I'm guessing it behaved a little
differently than it does now with both being JDBC.
The fact that the filter is at this location seems to indicate that it may have been by intended
for it to be across all dictionaries. On the other hand, it appears to mask out the lookups
for the different dictionaries, resulting in some annotations not being made.

So, the real question is how should the filter work -- should the annotation filtering be
per lookup dictionary, or be across all dictionaries? Or is there something wrong elsewhere
that causes
I lean towards having the filter function per dictionary. This may risk having duplicate annotations,
but that would probably be better than missing the annotation all together.

[IMAT Solutions]<http://imatsolutions.com>
Bruce Tietjen
Senior Software Engineer

On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <Sean.Finan@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.edu>>
Hi Bruce,

With Pei's help I just updated the sourceforge repo with the cTakes dictionaries.  Checkout
artifact ctakes-resources-snomed-rword-hsqldb-2011ab


-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<mailto:bruce.tietjen@perfectsearchcorp.com>]
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?

 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547<tel:801.634.1547>

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.edu>> wrote:

>  Hi Bruce,
> I would venture to say that this is neither expected nor desired.
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
> Sean
> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<mailto:bruce.tietjen@perfectsearchcorp.com>]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>  a gentle curve going into. irrigated with Bacitracin.
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
> Is this expected behavior? If not, what would be the expected behavior?
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547>
> bruce.tietjen@imatsolutions.com<mailto:bruce.tietjen@imatsolutions.com>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message