ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Tietjen <bruce.tiet...@perfectsearchcorp.com>
Subject Re: Differences in MedicationMention annotations on subsequent processing runs
Date Thu, 09 Oct 2014 15:41:23 GMT
I tried the Dictionary-lookup-fast module and the bahavior is the same. I
did have to run it a number of times before timing was right to reproduce
the issue. With the older lookup, chances were about 50/50 between which
dictionary ran first. Using the dictionary-fast, it seems more like 70/30
with the standard umls lookup being more likely to run first than not.
Which means that most of the time, there is no MedicationMention annotation
for Bacitracin.  (See Attached)

The code with the issue is the DictionaryLookupAnnotator which is a
container for the dictionaries and it iterates through the list of lookup
dictionaries so that part of the code path does not seem to have changed.

In the past, the rxNorm dictionary was a Lucene search and so I'm guessing
it behaved a little differently than it does now with both being JDBC.

The fact that the filter is at this location seems to indicate that it may
have been by intended for it to be across all dictionaries. On the other
hand, it appears to mask out the lookups for the different dictionaries,
resulting in some annotations not being made.

So, the real question is how should the filter work -- should the
annotation filtering be per lookup dictionary, or be across all
dictionaries? Or is there something wrong elsewhere that causes

I lean towards having the filter function per dictionary. This may risk
having duplicate annotations, but that would probably be better than
missing the annotation all together.







 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Bruce,
>
> With Pei's help I just updated the sourceforge repo with the cTakes
> dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab
>
> Sean
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> Sent: Wednesday, October 08, 2014 11:52 AM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in MedicationMention annotations on subsequent
> processing runs
>
> If I understand correctly, I would need new dictionary resources to run the
> rare word lookup method.
>
> Where can I find the necessary dictionary(ies) or how do I build them?
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tietjen@imatsolutions.com
>
> On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> >  Hi Bruce,
> >
> > I would venture to say that this is neither expected nor desired.
> >
> >
> >
> > Before you fix it (or in addition to a fix), try to run with the new
> > dictionary lookup.   It will have a different behavior, and it will be
> the
> > default dictionary lookup in future releases of cTakes – making fixes to
> > the current module slightly less urgent.
> >
> >
> >
> > Sean
> >
> >
> >
> > *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> > *Sent:* Wednesday, October 08, 2014 11:38 AM
> > *To:* dev@ctakes.apache.org
> > *Subject:* Differences in MedicationMention annotations on subsequent
> > processing runs
> >
> >
> >
> >
> >
> > I have encountered a situation in which the cTakes clinical pipeline
> > output differs between multiple runs on the same text with the same
> > configuration.
> >
> > The following snippets from a single document are sufficient to
> > demonstrate the issue:
> >
> >  a gentle curve going into. irrigated with Bacitracin.
> >
> >
> >
> > The source of the difference is that the DictionaryLookupAnnotator uses a
> > map to filter out duplicate annotations for a single document location:
> >
> >     // used to prevent duplicate hits
> >     // key = hit begin,end key (java.lang.String)
> >     // val = Set of MetaDataHit objects
> >     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
> >
> >  This map is shared between both the umls_ms_2011ab lookup and the
> > umls_ms_2011an_rxnorm lookup,
> >
> >
> >
> > If both dictionaries contain the same term, the order of dictionary
> lookup
> > execution determines the output.If the rxnorm lookup runs first, then a
> > MedicationMention annotation for Bacitracin appears in the final output.
> If
> > the standard umls lookup runs first, then there is no MedicationMention
> > annotation for Bacitracin.
> >
> > I will attach the output from the subsequent runs. (Hopefully the
> > attachment will make it through the system)
> >
> >
> >
> > Is this expected behavior? If not, what would be the expected behavior?
> >
> >
> >
> > [image: Image removed by sender. IMAT Solutions]
> > <http://imatsolutions.com>
> >
> > *Bruce Tietjen*
> > Senior Software Engineer
> > [image: Image removed by sender. Mobile:]801.634.1547
> > bruce.tietjen@imatsolutions.com
> >
>

Mime
View raw message