Return-Path: X-Original-To: apmail-ctakes-user-archive@www.apache.org Delivered-To: apmail-ctakes-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B517C18A63 for ; Fri, 24 Jul 2015 07:08:53 +0000 (UTC) Received: (qmail 80113 invoked by uid 500); 24 Jul 2015 07:08:19 -0000 Delivered-To: apmail-ctakes-user-archive@ctakes.apache.org Received: (qmail 80078 invoked by uid 500); 24 Jul 2015 07:08:19 -0000 Mailing-List: contact user-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ctakes.apache.org Delivered-To: mailing list user@ctakes.apache.org Received: (qmail 80067 invoked by uid 99); 24 Jul 2015 07:08:19 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jul 2015 07:08:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A0D031A7715 for ; Fri, 24 Jul 2015 07:08:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.999 X-Spam-Level: ** X-Spam-Status: No, score=2.999 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id MlR8qgJYf2Q3 for ; Fri, 24 Jul 2015 07:08:06 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bbn0106.outbound.protection.outlook.com [157.56.111.106]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id BD07220C4B for ; Fri, 24 Jul 2015 07:08:05 +0000 (UTC) Received: from CY1PR02MB1366.namprd02.prod.outlook.com (10.161.171.14) by CY1PR02MB1368.namprd02.prod.outlook.com (10.161.171.141) with Microsoft SMTP Server (TLS) id 15.1.219.17; Fri, 24 Jul 2015 07:07:56 +0000 Received: from CY1PR02MB1366.namprd02.prod.outlook.com ([10.161.171.14]) by CY1PR02MB1366.namprd02.prod.outlook.com ([10.161.171.14]) with mapi id 15.01.0219.018; Fri, 24 Jul 2015 07:07:57 +0000 From: Prashasti Agrawal To: "user@ctakes.apache.org" Subject: Re: Inconsistent IdentifiedAnnotation in different runs Thread-Topic: Inconsistent IdentifiedAnnotation in different runs Thread-Index: AQHQxSYaMEetGEksLkufJekCrlQ1sZ3pYzAggACup+k= Date: Fri, 24 Jul 2015 07:07:56 +0000 Message-ID: References: ,<924DE05C19409B438EB81DE683A942D9487B3037@CHEXMBX1A.CHBOSTON.ORG> In-Reply-To: <924DE05C19409B438EB81DE683A942D9487B3037@CHEXMBX1A.CHBOSTON.ORG> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: ctakes.apache.org; dkim=none (message not signed) header.d=none; x-originating-ip: [14.141.23.222] x-microsoft-exchange-diagnostics: 1;CY1PR02MB1368;5:8CDJC+ijsuh9lRowvOU9rngjF89Y7EKTxE3NbRlZ63hB2VpMthmCmyobjsimCBWtQ4fPV53Cwq0dTl1W+3K0Owj+VUkgH6B8VZlhJtUzH4Q1U0Y+Oc3uCpL08iWEs0dBgJpzk4dwO/L/NP185DvOog==;24:gJlNEz8muTXeFgY7hZIlCr/U+h8rtgoFvEcfDB9mWtsV27ZvmO5enYv9QM1//QQN6zx5O3p9XbPFVTPrQyypnl8nGT+lL17il3Ey6D6/N+s=;20:NnKVVMuWrWGb9BcOGsABk5PLAUgildYzdDa5amLectLhw/Wg0KSLkjjxn9EFl1t2tb8eVrQXH9LWHHS6+hTWWg== x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR02MB1368; cy1pr02mb1368: X-MS-Exchange-Organization-RulesExecuted x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:CY1PR02MB1368;BCL:0;PCL:0;RULEID:;SRVR:CY1PR02MB1368; x-forefront-prvs: 0647963F84 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(377454003)(33656002)(107886002)(110136002)(5002640100001)(5001960100002)(19580395003)(19580405001)(19625215002)(2351001)(66066001)(5890100001)(2900100001)(5003600100002)(2950100001)(77096005)(102836002)(16236675004)(15975445007)(2501003)(76576001)(189998001)(74316001)(92566002)(450100001)(122556002)(46102003)(77156002)(40100003)(19627405001)(87936001)(2656002)(62966003)(86362001)(99286002)(76176999)(50986999)(54356999)(106116001)(19617315012);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR02MB1368;H:CY1PR02MB1366.namprd02.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; Content-Type: multipart/alternative; boundary="_000_CY1PR02MB136669874ACFE6DB1B081011E2810CY1PR02MB1366namp_" MIME-Version: 1.0 X-OriginatorOrg: wincere.com X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jul 2015 07:07:56.1621 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3effc660-6eb4-484a-a58f-e955e9752321 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR02MB1368 --_000_CY1PR02MB136669874ACFE6DB1B081011E2810CY1PR02MB1366namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Chen Pie, I figured out where the problem was. But I am not able to figure out the re= ason or solution.I had configured my own dictionary from the UMLS knowledge= sources. I had made two tables in MySQL, one containing CUIs from SNOMEDCT= source (umls_snomed_2015, for disease, symptoms etc) and the other contain= ing CUIs from RXNORM (umls_rxNorm_2015 for medication). After a lot of deb= ugging and print statements, I figured out that in lookUpConsumer(UmlstoSno= medComsumerDbImpl), lookup hits are being matched against the valid TUIs in= DICT_UMLS_MS sometimes, and against valid TUIs in DICT_RXNORM_MS sometimes= . I have attached the LookUpDesc_Db file for reference. Regards, Prashasti Agrawal | Data Engineer | Noida INDIA | GMT +5:30 hours Mobile +91 9818812484 | prashasti.agrawal@wincere.com | www.wincere.com DISCLAIMER: This electronic transmission is governed by Wincere Inc. Any vi= ews or opinions expressed in this email are solely those of the author and = do not necessarily reflect the opinions of Wincere Inc. If you have receive= d this email in error, please delete all copies from your system and notify= the sender or contact us at: +1 855 855 2946 = or support@wincere.com. ________________________________ From: Chen, Pei Sent: Friday, July 24, 2015 12:11 AM To: user@ctakes.apache.org Subject: RE: Inconsistent IdentifiedAnnotation in different runs By any chance, Are you running this in multi threaded mode within the same JVM? And do you= have LVG included in the pipeline? I vaguely recall there were some non-thread safe code in the LVG component = (don't recall if the fix was made in the latest release yet.) If it's still returning the behavior, would you be able to help recreate it= with sample/dummy examples that could be shared? In particular the output = xmi files? --Pei From: Prashasti Agrawal [mailto:prashasti.agrawal@wincere.com] Sent: Thursday, July 23, 2015 5:05 AM To: user@ctakes.apache.org Subject: Inconsistent IdentifiedAnnotation in different runs Hi, I am running AggregatePlainTextUMLSProcessor analysis engine on a EMR docum= ent. I have added some modules like drug NER and template filler in the pip= eline. I am getting different Identified Annotations in different runs on t= he same document. (For example, in 8 DiseaseDisorderMention in one run, whi= le 15 in other). I am unable to understand why is this so. What am I missing here? Regards, Prashasti Agrawal | Data Engineer | Noida INDIA | GMT +5:30 hours Mobile +91 9818812484 | prashasti.agrawal@wincere.com | www.wincere.com DISCLAIMER: This electronic transmission is governed by Wincere Inc. Any vi= ews or opinions expressed in this email are solely those of the author and = do not necessarily reflect the opinions of Wincere Inc. If you have receive= d this email in error, please delete all copies from your system and notify= the sender or contact us at: +1 855 855 2946 = or support@wincere.com. --_000_CY1PR02MB136669874ACFE6DB1B081011E2810CY1PR02MB1366namp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Hi Chen Pie,


I figured out where the problem was. But I= am not able to figure out the reason or solution.I had configured my own d= ictionary from the UMLS knowledge sources. I had made two tables in MySQL, = one containing CUIs from SNOMEDCT source (umls_snomed_2015, for disease, symptoms etc) and the other co= ntaining CUIs from RXNORM (umls_rxNorm_2015 for medication).  Aft= er a lot of debugging and print statements, I figured out that in lookUpCon= sumer(UmlstoSnomedComsumerDbImpl), lookup hits are being matched against the valid TUIs in DICT_UMLS_MS sometimes, a= nd against valid TUIs in DICT_RXNORM_MS sometimes. I have attached the= LookUpDesc_Db file for reference.


<?xml version=3D"1.0" encoding=3D"UTF-8"?>= ;

<!--


    Licensed to the Apache Software Foundation (ASF) unde= r one

    or more contributor license agreements.  See the= NOTICE file

    distributed with this work for additional information=

    regarding copyright ownership.  The ASF licenses= this file

    to you under the Apache License, Version 2.0 (the

    "License"); you may not use this file excep= t in compliance

    with the License.  You may obtain a copy of the = License at


      http://www.apache.org/licenses/LICENSE-2.0


    Unless required by applicable law or agreed to in wri= ting,

    software distributed under the License is distributed= on an

    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDIT= IONS OF ANY

    KIND, either express or implied.  See the Licens= e for the

    specific language governing permissions and limitatio= ns

    under the License.


-->

<lookupSpecification>

<!--  Defines what dictionaries will be used in ter= ms of implementation specifics and metaField configuration. -->

<dictionaries>

<dictionary id=3D"DICT_UMLS_MS" externalResour= ceKey=3D"DbConnection" caseSensitive=3D"false">

<implementation>

<jdbcImpl tableName=3D"umls_ms_2015"/>

</implementation>

<lookupField fieldName=3D"fword"/>

<metaFields>

<metaField fieldName=3D"cui"/>

<metaField fieldName=3D"tui"/>

<metaField fieldName=3D"text"/>

</metaFields>

</dictionary>

<dictionary id=3D"DICT_RXNORM_MS" externalReso= urceKey=3D"DbConnection" caseSensitive=3D"false">

<implementation>

<jdbcImpl tableName=3D"umls_rxNorm_2015"/><= /font>

</implementation>

<lookupField fieldName=3D"fword"/>

<metaFields>

<metaField fieldName=3D"cui"/>

<metaField fieldName=3D"tui"/>

<metaField fieldName=3D"text"/>

</metaFields>

</dictionary>

</dictionaries>

<!-- Binds together the components necessary to perform = the complete lookup logic start to end. -->

<lookupBindings>

<lookupBinding>

<dictionaryRef idRef=3D"DICT_UMLS_MS"/>

<lookupInitializer className=3D"org.apache.ctakes.d= ictionary.lookup.ae.FirstTokenPermLookupInitializerImpl">

<properties>

<property key=3D"textMetaFields" value=3D"= ;text"/>

<property key=3D"maxPermutationLevel" value=3D= "7"/>

<!-- <property key=3D"windowAnnotations" value=3D"org.a= pache.ctakes.typesystem.type.textspan.Sentence"/> -->

<property key=3D"windowAnnotations" value=3D&q= uot;org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation"= />  

<property key=3D"exclusionTags" value=3D"= VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,W= DT,WP,WPS,WRB"/>

</properties>

</lookupInitializer>

<lookupConsumer className=3D"org.apache.ctakes.dict= ionary.lookup.ae.UmlsToSnomedDbConsumerImpl">

<properties>

<property key=3D"codingScheme" value=3D"S= NOMED"/>

<property key=3D"cuiMetaField" value=3D"c= ui"/>

<property key=3D"tuiMetaField" value=3D"t= ui"/>

<property key=3D"textMetaField" value=3D"= text"/>

<property key=3D"anatomicalSiteTuis" value=3D&= quot;T021,T022,T023,T024,T025,T026,T029,T030"/> 

<property key=3D"procedureTuis" value=3D"= T060,T061"/>

<property key=3D"disorderTuis" value=3D"T= 019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>

<property key=3D"findingTuis" value=3D"T0= 33,T034,T040,T041,T042,T043,T044,T045,T056,T057,T184"/>

<property key=3D"labTuis" value=3D"T059,T= 116"/>

<property key=3D"dbConnExtResrcKey" value=3D&q= uot;DbConnection"/>

<property key=3D"mapPrepStmt" value=3D"se= lect code from umls_snomed_map where cui=3D?"/>

</properties>

</lookupConsumer>

</lookupBinding>

<lookupBinding>

<dictionaryRef idRef=3D"DICT_RXNORM_MS"/>

<lookupInitializer className=3D"org.apache.ctakes.d= ictionary.lookup.ae.FirstTokenPermLookupInitializerImpl">

<properties>

<property key=3D"textMetaFields" value=3D"= ;text"/>

<property key=3D"maxPermutationLevel" value=3D= "7"/>

<!-- <property key=3D"windowAnnotations" value=3D"org.a= pache.ctakes.typesystem.type.textspan.Sentence"/> -->

<property key=3D"windowAnnotations" value=3D&q= uot;org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation"= />  

<property key=3D"exclusionTags" value=3D"= VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,W= DT,WP,WPS,WRB"/>

</properties>


</lookupInitializer>

<lookupConsumer className=3D"org.apache.ctakes.dict= ionary.lookup.ae.UmlsToSnomedDbConsumerImpl">

<properties>

<property key=3D"codingScheme" value=3D"R= XNORM"/>

<property key=3D"cuiMetaField" value=3D"c= ui"/>

<property key=3D"tuiMetaField" value=3D"t= ui"/>

<property key=3D"textMetaField" value=3D"= text"/>

<property key=3D"medicationTuis" value=3D"= ;T073,T103,T109,T110,T111,T115,T121,T122,T123,T130,T168,T192,T195,T197,T200= ,T203 "/>

<property key=3D"dbConnExtResrcKey" value=3D&q= uot;DbConnection"/>

<property key=3D"mapPrepStmt" value=3D"se= lect code from umls_rxNorm_map where cui=3D?"/>

</properties>

</lookupConsumer>

</lookupBinding>

</lookupBindings>

</lookupSpecification>

  


Regards,

Prashasti A= grawal | Data Engineer | Noida INDIA | GMT +5:30 hours=

Mobile +91 9818812484 | prashasti.agrawal@wincere.com  = ;|



www.wincere.com

DISCLAIMER: Th= is electronic transmission is governed by Wincere Inc. Any views or opinion= s expressed in this email are solely those of the author and do not necessarily reflect the opinions of Wincere = Inc. If you have received this email in error, please delete all copie= s from your system and notify the sender or contact us at: <= /span>+1 855 855 2946 or support@wincere.com






From: Chen, Pe= i <Pei.Chen@childrens.harvard.edu>
Sent: Friday, July 24, 2015 12:11 AM
To: user@ctakes.apache.org
Subject: RE: Inconsistent IdentifiedAnnotation in different runs
 

By any chance,

Are you running this in multi threaded = mode within the same JVM? And do you have LVG included in the pipeline?

I vaguely recall there were some non-th= read safe code in the LVG component (don’t recall if the fix was made in the latest release yet.)

 

If it’s still returning the behav= ior, would you be able to help recreate it with sample/dummy examples that could be shared? In particular the output xmi f= iles?

--Pei

 

From: Prashasti Agrawal [mailto:prashasti.agrawal@wincere.com]
Sent: Thursday, July 23, 2015 5:05 AM
To: user@ctakes.apache.org
Subject: Inconsistent IdentifiedAnnotation in different runs
<= /span>

 

Hi,

 

I am running Aggregate= PlainTextUMLSProcessor analysis engine on a EMR document. I have added some= modules like drug NER and template filler in the pipeline. I am getting different Identified Annotations in differen= t runs on the same document. (For example, in 8 DiseaseDisorderMention in o= ne run, while 15 in other).

 

I am unable to underst= and why is this so. What am I missing here?

 

Regards,

Pr= ashasti Agrawal | Data Engineer | Noida INDIA | GMT +5:30 hours

Mobile +91 9818812484 | prashasti.agrawal@wincere.com  |

 

www.wincere.com<= /span>

DISCLAIMER: This electronic transmission is governed by Wincere= Inc. Any views or opinions expressed in this email are solely those of the author and do not necessarily refl= ect the opinions of Wincere Inc. If you have received this email in er= ror, please delete all copies from your system and notify the sender o= r contact us at: +1 855 855 2946 or support@wincere.c= om

 

--_000_CY1PR02MB136669874ACFE6DB1B081011E2810CY1PR02MB1366namp_--