ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Tietjen <bruce.tiet...@perfectsearchcorp.com>
Subject Re: Differences in MedicationMention annotations on subsequent processing runs
Date Thu, 09 Oct 2014 18:23:07 GMT
Sorry, my mistake, it was still running the old dictionary lookups.

Since your earlier question, I have been trying to get the lookup-fast to
work and have not yet been successful.

I made the change to AgregatePlaintextUMLSProcessor.xml:

<!--
    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
      <import
location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml"/>
    </delegateAnalysisEngine>
-->

    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
      <import
location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>
    </delegateAnalysisEngine>



But I've been getting the following exception and trying to figure out why:

Caused by: org.apache.uima.resource.ResourceInitializationException: Could
not access the resource data at
file:org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml.
    at
org.apache.uima.resource.impl.DataResource_impl.initialize(DataResource_impl.java:127)
    at
org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:123)
    ... 31 more





 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Thu, Oct 9, 2014 at 11:42 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> I just ran the –fast with an example containing  bacitracin in four
> sentences, once being the first word and once being the last.  In ten of
> ten runs all four bacitracin mentions were discovered.
>
> You completely replaced the dictionary lookup with ?
>     <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
>       <import
> location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>
>     </delegateAnalysisEngine>
>
>
> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> Sent: Thursday, October 09, 2014 11:42 AM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in MedicationMention annotations on subsequent
> processing runs
>
> I tried the Dictionary-lookup-fast module and the bahavior is the same. I
> did have to run it a number of times before timing was right to reproduce
> the issue. With the older lookup, chances were about 50/50 between which
> dictionary ran first. Using the dictionary-fast, it seems more like 70/30
> with the standard umls lookup being more likely to run first than not.
> Which means that most of the time, there is no MedicationMention annotation
> for Bacitracin.  (See Attached)
> The code with the issue is the DictionaryLookupAnnotator which is a
> container for the dictionaries and it iterates through the list of lookup
> dictionaries so that part of the code path does not seem to have changed.
> In the past, the rxNorm dictionary was a Lucene search and so I'm guessing
> it behaved a little differently than it does now with both being JDBC.
> The fact that the filter is at this location seems to indicate that it may
> have been by intended for it to be across all dictionaries. On the other
> hand, it appears to mask out the lookups for the different dictionaries,
> resulting in some annotations not being made.
>
> So, the real question is how should the filter work -- should the
> annotation filtering be per lookup dictionary, or be across all
> dictionaries? Or is there something wrong elsewhere that causes
> I lean towards having the filter function per dictionary. This may risk
> having duplicate annotations, but that would probably be better than
> missing the annotation all together.
>
>
>
>
>
> [IMAT Solutions]<http://imatsolutions.com>
> Bruce Tietjen
> Senior Software Engineer
> [Mobile:]801.634.1547
> bruce.tietjen@imatsolutions.com<mailto:bruce.tietjen@imatsolutions.com>
>
> On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.edu>>
> wrote:
> Hi Bruce,
>
> With Pei's help I just updated the sourceforge repo with the cTakes
> dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab
>
> Sean
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<mailto:
> bruce.tietjen@perfectsearchcorp.com>]
> Sent: Wednesday, October 08, 2014 11:52 AM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: Re: Differences in MedicationMention annotations on subsequent
> processing runs
>
> If I understand correctly, I would need new dictionary resources to run the
> rare word lookup method.
>
> Where can I find the necessary dictionary(ies) or how do I build them?
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547<tel:801.634.1547>
> bruce.tietjen@imatsolutions.com<mailto:bruce.tietjen@imatsolutions.com>
>
> On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.edu>>
> wrote:
>
> >  Hi Bruce,
> >
> > I would venture to say that this is neither expected nor desired.
> >
> >
> >
> > Before you fix it (or in addition to a fix), try to run with the new
> > dictionary lookup.   It will have a different behavior, and it will be
> the
> > default dictionary lookup in future releases of cTakes – making fixes to
> > the current module slightly less urgent.
> >
> >
> >
> > Sean
> >
> >
> >
> > *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com
> <mailto:bruce.tietjen@perfectsearchcorp.com>]
> > *Sent:* Wednesday, October 08, 2014 11:38 AM
> > *To:* dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> > *Subject:* Differences in MedicationMention annotations on subsequent
> > processing runs
> >
> >
> >
> >
> >
> > I have encountered a situation in which the cTakes clinical pipeline
> > output differs between multiple runs on the same text with the same
> > configuration.
> >
> > The following snippets from a single document are sufficient to
> > demonstrate the issue:
> >
> >  a gentle curve going into. irrigated with Bacitracin.
> >
> >
> >
> > The source of the difference is that the DictionaryLookupAnnotator uses a
> > map to filter out duplicate annotations for a single document location:
> >
> >     // used to prevent duplicate hits
> >     // key = hit begin,end key (java.lang.String)
> >     // val = Set of MetaDataHit objects
> >     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
> >
> >  This map is shared between both the umls_ms_2011ab lookup and the
> > umls_ms_2011an_rxnorm lookup,
> >
> >
> >
> > If both dictionaries contain the same term, the order of dictionary
> lookup
> > execution determines the output.If the rxnorm lookup runs first, then a
> > MedicationMention annotation for Bacitracin appears in the final output.
> If
> > the standard umls lookup runs first, then there is no MedicationMention
> > annotation for Bacitracin.
> >
> > I will attach the output from the subsequent runs. (Hopefully the
> > attachment will make it through the system)
> >
> >
> >
> > Is this expected behavior? If not, what would be the expected behavior?
> >
> >
> >
> > [image: Image removed by sender. IMAT Solutions]
> > <http://imatsolutions.com>
> >
> > *Bruce Tietjen*
> > Senior Software Engineer
> > [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547>
> > bruce.tietjen@imatsolutions.com<mailto:bruce.tietjen@imatsolutions.com>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message