ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Tietjen <bruce.tiet...@perfectsearchcorp.com>
Subject Re: Inconsistent IdentifiedAnnotation in different runs
Date Fri, 24 Jul 2015 16:11:33 GMT
You might want to check to see if this is because of different ordering of
the dictionary lookup execution.

If I remember correctly, the order of dictionary lookup execution is
determined by the ordering of the HashSet they are added to during
initialization. When cTakes was first implemented, 'HashSet'  was
deterministic in its hashing so it would always have the same ordering
between runs, but a couple years ago (for security reasons), Java was
changed so that HashSet is no longer deterministic between runs. This
results in different ordering of dictionary lookups between runs.

(We have modified the cTakes we use by replacing many of the 'HashSet's
that cTakes uses with 'LinkedHashSet' so that iterators always return the
contents in the same order they were added.  This has helped us to achieve
consistent results between executions.)



On Fri, Jul 24, 2015 at 1:07 AM, Prashasti Agrawal <
prashasti.agrawal@wincere.com> wrote:

>   Hi Chen Pie,
>
>
>  I figured out where the problem was. But I am not able to figure out the
> reason or solution.I had configured my own dictionary from the UMLS
> knowledge sources. I had made two tables in MySQL, one containing CUIs from
> SNOMEDCT source (umls_snomed_2015, for disease, symptoms etc) and the other
> containing CUIs from RXNORM (umls_rxNorm_2015 for medication).  After a lot
> of debugging and print statements, I figured out that in
> lookUpConsumer(UmlstoSnomedComsumerDbImpl), lookup hits are being matched
> against the valid TUIs in DICT_UMLS_MS sometimes, and against valid TUIs
> in DICT_RXNORM_MS sometimes. I have attached the LookUpDesc_Db file for
> reference.
>
>
>  <?xml version="1.0" encoding="UTF-8"?>
>
> <!--
>
>
>      Licensed to the Apache Software Foundation (ASF) under one
>
>     or more contributor license agreements.  See the NOTICE file
>
>     distributed with this work for additional information
>
>     regarding copyright ownership.  The ASF licenses this file
>
>     to you under the Apache License, Version 2.0 (the
>
>     "License"); you may not use this file except in compliance
>
>     with the License.  You may obtain a copy of the License at
>
>
>        http://www.apache.org/licenses/LICENSE-2.0
>
>
>      Unless required by applicable law or agreed to in writing,
>
>     software distributed under the License is distributed on an
>
>     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
>
>     KIND, either express or implied.  See the License for the
>
>     specific language governing permissions and limitations
>
>     under the License.
>
>
>  -->
>
> <lookupSpecification>
>
> <!--  Defines what dictionaries will be used in terms of implementation
> specifics and metaField configuration. -->
>
> <dictionaries>
>
> <dictionary id="DICT_UMLS_MS" externalResourceKey="DbConnection"
> caseSensitive="false">
>
> <implementation>
>
> <jdbcImpl tableName="umls_ms_2015"/>
>
> </implementation>
>
> <lookupField fieldName="fword"/>
>
> <metaFields>
>
> <metaField fieldName="cui"/>
>
> <metaField fieldName="tui"/>
>
> <metaField fieldName="text"/>
>
> </metaFields>
>
> </dictionary>
>
> <dictionary id="DICT_RXNORM_MS" externalResourceKey="DbConnection"
> caseSensitive="false">
>
> <implementation>
>
> <jdbcImpl tableName="umls_rxNorm_2015"/>
>
> </implementation>
>
> <lookupField fieldName="fword"/>
>
> <metaFields>
>
> <metaField fieldName="cui"/>
>
> <metaField fieldName="tui"/>
>
> <metaField fieldName="text"/>
>
> </metaFields>
>
> </dictionary>
>
> </dictionaries>
>
> <!-- Binds together the components necessary to perform the complete
> lookup logic start to end. -->
>
> <lookupBindings>
>
> <lookupBinding>
>
> <dictionaryRef idRef="DICT_UMLS_MS"/>
>
> <lookupInitializer
> className="org.apache.ctakes.dictionary.lookup.ae.FirstTokenPermLookupInitializerImpl">
>
> <properties>
>
> <property key="textMetaFields" value="text"/>
>
> <property key="maxPermutationLevel" value="7"/>
>
> <!-- <property key="windowAnnotations"
> value="org.apache.ctakes.typesystem.type.textspan.Sentence"/> -->
>
> <property key="windowAnnotations"
> value="org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation"/>
>
>
> <property key="exclusionTags"
> value="VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"/>
>
> </properties>
>
> </lookupInitializer>
>
> <lookupConsumer
> className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl">
>
> <properties>
>
> <property key="codingScheme" value="SNOMED"/>
>
> <property key="cuiMetaField" value="cui"/>
>
> <property key="tuiMetaField" value="tui"/>
>
> <property key="textMetaField" value="text"/>
>
> <property key="anatomicalSiteTuis"
> value="T021,T022,T023,T024,T025,T026,T029,T030"/>
>
> <property key="procedureTuis" value="T060,T061"/>
>
> <property key="disorderTuis"
> value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>
>
> <property key="findingTuis"
> value="T033,T034,T040,T041,T042,T043,T044,T045,T056,T057,T184"/>
>
> <property key="labTuis" value="T059,T116"/>
>
> <property key="dbConnExtResrcKey" value="DbConnection"/>
>
> <property key="mapPrepStmt" value="select code from umls_snomed_map where
> cui=?"/>
>
> </properties>
>
> </lookupConsumer>
>
> </lookupBinding>
>
> <lookupBinding>
>
> <dictionaryRef idRef="DICT_RXNORM_MS"/>
>
> <lookupInitializer
> className="org.apache.ctakes.dictionary.lookup.ae.FirstTokenPermLookupInitializerImpl">
>
> <properties>
>
> <property key="textMetaFields" value="text"/>
>
> <property key="maxPermutationLevel" value="7"/>
>
> <!-- <property key="windowAnnotations"
> value="org.apache.ctakes.typesystem.type.textspan.Sentence"/> -->
>
> <property key="windowAnnotations"
> value="org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation"/>
>
>
> <property key="exclusionTags"
> value="VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"/>
>
> </properties>
>
>
>  </lookupInitializer>
>
> <lookupConsumer
> className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl">
>
> <properties>
>
> <property key="codingScheme" value="RXNORM"/>
>
> <property key="cuiMetaField" value="cui"/>
>
> <property key="tuiMetaField" value="tui"/>
>
> <property key="textMetaField" value="text"/>
>
> <property key="medicationTuis"
> value="T073,T103,T109,T110,T111,T115,T121,T122,T123,T130,T168,T192,T195,T197,T200,T203
> "/>
>
> <property key="dbConnExtResrcKey" value="DbConnection"/>
>
> <property key="mapPrepStmt" value="select code from umls_rxNorm_map where
> cui=?"/>
>
> </properties>
>
> </lookupConsumer>
>
> </lookupBinding>
>
> </lookupBindings>
>
> </lookupSpecification>
>
>
>
>
>   Regards,
>
> Prashasti Agrawal | Data Engineer | Noida INDIA | GMT +5:30 hours
>
> Mobile +91 9818812484 | prashasti.agrawal <prashasti.agrawal@wincere.com>
> @wincere.com <prashasti.agrawal@wincere.com>  |
>
>
>
>  www.wincere.com
>
> DISCLAIMER: This electronic transmission is governed by Wincere Inc. Any
> views or opinions expressed in this email are solely those of the author
> and do not necessarily reflect the opinions of Wincere Inc. If you have
> received this email in error, please delete all copies from your system and
> notify the sender or contact us at: +1 855 855 2946
> <%2B1%20855%20855%202946> or support@wincere.com.
>
>
>
>
>
>  ------------------------------
> *From:* Chen, Pei <Pei.Chen@childrens.harvard.edu>
> *Sent:* Friday, July 24, 2015 12:11 AM
> *To:* user@ctakes.apache.org
> *Subject:* RE: Inconsistent IdentifiedAnnotation in different runs
>
>
> By any chance,
>
> Are you running this in multi threaded mode within the same JVM? And do
> you have LVG included in the pipeline?
>
> I vaguely recall there were some non-thread safe code in the LVG component
> (don’t recall if the fix was made in the latest release yet.)
>
>
>
> If it’s still returning the behavior, would you be able to help recreate
> it with sample/dummy examples that could be shared? In particular the
> output xmi files?
>
> --Pei
>
>
>
> *From:* Prashasti Agrawal [mailto:prashasti.agrawal@wincere.com]
> *Sent:* Thursday, July 23, 2015 5:05 AM
> *To:* user@ctakes.apache.org
> *Subject:* Inconsistent IdentifiedAnnotation in different runs
>
>
>
> Hi,
>
>
>
> I am running AggregatePlainTextUMLSProcessor analysis engine on a EMR
> document. I have added some modules like drug NER and template filler in
> the pipeline. I am getting different Identified Annotations in different
> runs on the same document. (For example, in 8 DiseaseDisorderMention in one
> run, while 15 in other).
>
>
>
> I am unable to understand why is this so. What am I missing here?
>
>
>
> Regards,
>
> Prashasti Agrawal | Data Engineer | Noida INDIA | GMT +5:30 hours
>
> Mobile +91 9818812484 | prashasti.agrawal <prashasti.agrawal@wincere.com>
> @wincere.com <prashasti.agrawal@wincere.com>  |
>
>
>
> www.wincere.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.wincere.com_&d=BQMFAw&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=U6__j_v3_B-W5JMJPciXAfZyN4BN_Fi4g6GcMDx8LuM&s=C7gs6IxajIF4w8cHqxyNVfyc1IinBBkEpGRa8efVTko&e=>
>
> DISCLAIMER: This electronic transmission is governed by Wincere Inc. Any
> views or opinions expressed in this email are solely those of the author
> and do not necessarily reflect the opinions of Wincere Inc. If you have
> received this email in error, please delete all copies from your system and
> notify the sender or contact us at: +1 855 855 2946
> <%2B1%20855%20855%202946> or support@wincere.com.
>
>
>

Mime
View raw message