Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6671D1826B for ; Mon, 21 Sep 2015 13:35:01 +0000 (UTC) Received: (qmail 33212 invoked by uid 500); 21 Sep 2015 13:34:55 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 33157 invoked by uid 500); 21 Sep 2015 13:34:55 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 33146 invoked by uid 99); 21 Sep 2015 13:34:54 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Sep 2015 13:34:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 57EC0C08AC for ; Mon, 21 Sep 2015 13:34:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.794 X-Spam-Level: X-Spam-Status: No, score=0.794 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.006, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id hghywOyFAE4B for ; Mon, 21 Sep 2015 13:34:46 +0000 (UTC) Received: from zixvpm.geisinger.edu (zixvpm01.geisinger.edu [159.240.9.8]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id C137C204D2 for ; Mon, 21 Sep 2015 13:34:45 +0000 (UTC) Received: from 127.0.0.1 (ZixVPM [127.0.0.1]) by Outbound.geisinger.edu (Proprietary) with SMTP id 7A6AE2B5684 for ; Mon, 21 Sep 2015 09:34:37 -0400 (EDT) Received: from GDCEXMBX205W12V.geisinger.edu (ghsnetprew8k1v.geisinger.edu [10.230.9.96]) by zixvpm.geisinger.edu (Proprietary) with ESMTP id DF0A72B5652 for ; Mon, 21 Sep 2015 09:34:36 -0400 (EDT) Received: from LOFEXMBX207W12V.geisinger.edu (10.240.245.19) by GDCEXMBX205W12V.geisinger.edu (10.240.245.17) with Microsoft SMTP Server (TLS) id 15.0.1104.5; Mon, 21 Sep 2015 09:34:36 -0400 Received: from LOFEXMBX207W12V.geisinger.edu ([fe80::3464:515c:5d48:2220]) by LOFEXMBX207W12V.geisinger.edu ([fe80::3464:515c:5d48:2220%12]) with mapi id 15.00.1104.000; Mon, 21 Sep 2015 09:34:36 -0400 From: "Geise, Brandon D." To: "dev@ctakes.apache.org" Subject: RE: Fast Dictionary Update Thread-Topic: Fast Dictionary Update Thread-Index: AdDwpEDakN4hdJN4TiCY7ydXmSXEcgAAvDPQAAAWVDAAAKHT4AABUIQQAADH+lAAAJVtgAAAUCVgAAjWOQAAB9yZYAAPkG2gAByW6ZAAN2wGoP/830sAgAA3SmD//5aT2P//6jDw///OoFD//5vU8P/+ZYZw//zJRFD/+Wyt0P/y2Ctg/+VnE1D/yVdRAP+OXdGA Date: Mon, 21 Sep 2015 13:34:36 +0000 Message-ID: <1f045ed77cb0472c97e68a7ba9244b18@LOFEXMBX207W12V.geisinger.edu> References: <54a6cd2b086f41ecb795c8ddbf28dc9b@LOFEXMBX207W12V.geisinger.edu> <874af4e312ed4cb688a7f9fd7b922518@LOFEXMBX207W12V.geisinger.edu> <7c53c1cc96844ee3804e2943dc01dd84@CHEXMAIL1B.CHBOSTON.ORG> <33605790b50c4a6e819d3eb44a39c722@LOFEXMBX207W12V.geisinger.edu> <52c45d776a4748d481b090d70571f556@CHEXMAIL1B.CHBOSTON.ORG> <2db66f0b729b48f387cccfce51d1eaac@LOFEXMBX207W12V.geisinger.edu> <51eeec0e67b342d992f46616dcfa6803@CHEXMAIL1B.CHBOSTON.ORG> <183f494706904efb904c87d02807048d@LOFEXMBX207W12V.geisinger.edu> <25c45de824e049f3aecad05cb9fc1081@CHEXMAIL1B.CHBOSTON.ORG>,<1e1ea829e0d14630a5abeee06318f1e2@LOFEXMBX207W12V.geisinger.edu> <8F9D88F820833442A712461DD262D0F628FBA38A@xm-mbx-07-prod.ad.uchicago.edu>, <3FB2687144DC0E3C.1F5196E5-3840-442B-B1A8-ED95A6E5EFEB@mail.outlook.com> <0be78a1237b94652b9dc7dd27aaeb603@CHEXMAIL1B.CHBOSTON.ORG> <1696fdb0c6df4e9db3abfebfb7cfc7eb@CHEXMAIL1B.CHBOSTON.ORG> <67a6092769444ea2adc73eb9fbd41cf4@CHEXMAIL1A.CHBOSTON.ORG>, <8F9D88F820833442A712461DD262D0F628FBB426@xm-mbx-07-prod.ad.uchicago.edu> In-Reply-To: <8F9D88F820833442A712461DD262D0F628FBB426@xm-mbx-07-prod.ad.uchicago.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.240.245.250] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Symantec-Inspected: Forwarded X-VPM-MSG-ID: e74eba49-8642-467c-b220-e5baa32d31e4 X-VPM-HOST: zixvpm01.geisinger.edu X-VPM-GROUP-ID: 01cd158c-ff69-4153-935a-6ca895239c16 X-VPM-ENC-REGIME: Plaintext X-VPM-CERT-FLAG: 0 X-VPM-IS-HYBRID: 0 Hi Tomasz, Here are the steps I used based on Sean's help and the documentation. 1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US" 2. Copy ctakesumls.properties and ctakesumls.script from memdbtemplate to = location to put new UMLS DB 3. Run DictionaryCreator2 java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.Diction= aryCreator2 -umls "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -= db jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS 4. Run CodeMapCreator java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.CodeMap= Creator -umls "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db j= dbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS 5. Copy new DB files to new location and create a copy of cTakesHsql.xml a= nd update dictionary location Thanks, Brandon -----Original Message----- From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]=20 Sent: Friday, September 18, 2015 11:39 AM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Brandon, Yes, I think it would be great if you could document the steps you did so f= ar and post them here. Regards, Tomasz ________________________________________ From: Geise, Brandon D. [bdgeise@geisinger.edu] Sent: Thursday, September 17, 2015 4:18 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update I'm getting output now with CUIs. I think running the codeMap fixed the is= sue I was having. Would it be beneficial if I documented my steps so it could be shared? Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] Sent: Thursday, September 17, 2015 12:57 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Making an alternate copy of cTakesHsql.xml and pointing to the new dictiona= ry is all that is necessary. Do you see a message in the initialization ou= tput indicating that the dictionary db has been loaded? -----Original Message----- From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] Sent: Thursday, September 17, 2015 12:54 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Great, thanks both seemed to work for populating the script table. Besides the path to the new dictionary needing to be changed in cTakesHsql.= xml, does anything else need to be modified to use the new dictionary? My = pipeline runs however there aren't any annotations related to the UMLS conc= epts. The only annotations I'm seeing are date, roman numeral, or modifier= related. (My pipeline if UMLSFastProcessor with additions for modifiers an= d templatefiller). Any suggestions would be appreciated. Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] Sent: Thursday, September 17, 2015 10:40 AM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Correct, Hsql should automatically read the .log file upon first use, and t= hen perform the inserts into the .script file. In case you want to play it safe, check the README in the resource/ directo= ry (where you got the hsqldb template). The last paragraph indicates how y= ou can launch a simple sql tool to play with the db. You will need to chan= ge the name of the db accordingly. Upon first launch of the sql tool every= thing should be moved from the .log to the .script file. It is a strange = setup/workflow, but it seems to work. Sean -----Original Message----- From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] Sent: Thursday, September 17, 2015 10:31 AM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update When I run the tool it outputs a file with a .log extension that has all th= e insert statements. Do I copy this to the .script template from memcached= b in the dictionarytool project or should the inserts be put into the .scri= pt file by default on the program execution? Thanks, Brandon -----Original Message----- From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 9:59 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Excellent! -----Original Message----- From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] Sent: Wednesday, September 16, 2015 9:55 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update No, I had changed it on the Tiny source file. I just changed the default f= ile and it looks to be running as expected now. Thank you for all your help and patience, Brandon -----Original Message----- From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 9:35 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Did you add it to data/default/ CtakesSources.txt ? If not then you need to specify -src ./data/tiny/CtakesSources.txt Sorry for any confusion. As soon as my inet isn't overloaded I'll download 2015AA and see if I can b= uild a dictionary. -----Original Message----- From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] Sent: Wednesday, September 16, 2015 8:14 PM To: dev@ctakes.apache.org; dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Sean, I added that and still had the same issue. Thanks, Brandon _____________________________ From: Finan, Sean > Sent: Wednesday, September 16, 2015 7:56 PM Subject: RE: Fast Dictionary Update To: > And you added "SNOMEDCT_US" to data/tiny/CtakesSources.txt ? -----Original Message----- From: Tomasz Oliwa [mailto:oliwa@uchicago.edu] Sent: Wednesday, September 16, 2015 7:13 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update I have exactly the same problem with the tool. A grep on MRCONSO.RRF for "SNOMEDCT" or for "SNOMEDCT_US" shows many lines. ________________________________________ From: Geise, Brandon D. [bdgeise@geisinger.edu] Sent: Wednesday, September 16, 2015 5:05 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Yes, it finds "SNOMEDCT_US". -----Original Message----- From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 5:17 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Ah, now I see what you mean. Can you do a grep on your MRCONSO.RRF for "SNO= MEDCT" ? -----Original Message----- From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] Sent: Wednesday, September 16, 2015 4:04 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update I tried changing as suggested. Below is what I see for the snomed piece, but for RXNorm it writes terms at= the end. Reading list of Source Types from ./data/default/CtakesSources.txt File Lin= es 1 list of Source Types 1 Reading list of Tuis from ./data/tiny/CtakesSno= medTuis.txt File Lines 24 list of Tuis 24 Compiling list of Cuis with wante= d Tuis using /patto/UMLS_Current_Version/META/MRSTY.RRF File Line 200000 Cuis 60895 File Line 300000 Cuis 85750 File Line 400000 Cuis 135098 File Line 600000 Cuis 183925 File Line 1700000 Cuis 376338 File Line 1800000 C= uis 471009 File Line 1900000 Cuis 568375 File Line 2100000 Cuis 674715 File Line 2800000 Cuis 903583 File Line = 3300000 Cuis 973791 File Lines 3370173 Cuis 99945= 1 ..................................................File Line 100000 Valid = Cuis 0 ..................................................File Line 200000 V= alid Cuis 0 ..................................................File Line 300= 000 Valid Cuis 0 ..................................................File Lin= e 400000 Valid Cuis 0 ..................................................Fil= e Line 500000 Valid Cuis 0 ................................................= ..File Line 600000 Valid Cuis 0 ...........................................= .......File Line 700000 Valid Cuis 0 ......................................= ............File Line 800000 Valid Cuis 0 .................................= .................File Line 900000 Valid Cuis 0 ............................= ......................File Line 1000000 Valid Cuis 0 .........= .........................................File Line 1100000 Val= id Cuis 0 ..................................................File Line 12000= 00 Valid Cuis 0 ..............................................= ....File Line 1300000 Valid Cuis 0 ...........................= .......................File Line 1400000 Valid Cuis 0 ........= ..........................................File Line 1500000 Va= lid Cuis 0 ..................................................File Line 1600= 000 Valid Cuis 0 .............................................= .....File Line 1700000 Valid Cuis 0 ..........................= ........................File Line 1800000 Valid Cuis 0 .......= ...........................................File Line 1900000 V= alid Cuis 0 ..................................................File Line 200= 0000 Valid Cuis 0 ............................................= ......File Line 2100000 Valid Cuis 0 .........................= .........................File Line 2200000 Valid Cuis 0 ......= ............................................File Line 2300000 = Valid Cuis 0 ..................................................File Line 24= 00000 Valid Cuis 0 ...........................................= .......File Line 2500000 Valid Cuis 0 ........................= ..........................File Line 2600000 Valid Cuis 0 .....= .............................................File Line 2700000= Valid Cuis 0 ..................................................File Line 2= 800000 Valid Cuis 0 ..........................................= ........File Line 2900000 Valid Cuis 0 .......................= ...........................File Line 3000000 Valid Cuis 0 ....= ..............................................File Line 3100000 Valid Cuis 0 ..................................................File Line = 3200000 Valid Cuis 0 .........................................= .........File Line 3300000 Valid Cuis 0 ......................= ............................File Line 3400000 Valid Cuis 0 ...= ...............................................File Line 3500000 Valid Cuis 0 ..................................................File Line= 3600000 Valid Cuis 0 ........................................= ..........File Line 3700000 Valid Cuis 0 .....................= .............................File Line 3800000 Valid Cuis 0 ..= ................................................File Line 3900000 Valid Cuis 0 ..................................................File Lin= e 4000000 Valid Cuis 0 .......................................= ...........File Line 4100000 Valid Cuis 0 ....................= ..............................File Line 4200000 Valid Cuis 0 .= .................................................File Line 4300000 Valid Cuis 0 ..................................................File Li= ne 4400000 Valid Cuis 0 ......................................= ............File Line 4500000 Valid Cuis 0 ...................= ...............................File Line 4600000 Valid Cuis 0 = ..................................................File Line 4700000 Valid Cuis 0 ..................................................File L= ine 4800000 Valid Cuis 0 .....................................= .............File Line 4900000 Valid Cuis 0 ..................= ................................File Line 5000000 Valid Cuis 0= ..................................................File Line 5100000 Valid Cuis 0 ..................................................File = Line 5200000 Valid Cuis 0 ....................................= ..............File Line 5300000 Valid Cuis 0 .................= .................................File Line 5400000 Valid Cuis = 0 ..................................................File Line 5500000 Valid Cuis 0 ..................................................File= Line 5600000 Valid Cuis 0 ...................................= ...............File Line 5700000 Valid Cuis 0 ................= ..................................File Line 5800000 Valid Cuis= 0 ..................................................File Line 5900000 Valid Cuis 0 ..................................................Fil= e Line 6000000 Valid Cuis 0 ..................................= ................File Line 6100000 Valid Cuis 0 ...............= ...................................File Line 6200000 Valid Cui= s 0 ..................................................File Line 6300000 Valid Cuis 0 ..................................................Fi= le Line 6400000 Valid Cuis 0 .................................= .................File Line 6500000 Valid Cuis 0 ..............= ....................................File Line 6600000 Valid Cu= is 0 ..................................................File Line 6700000 Valid Cuis 0 ..................................................F= ile Line 6800000 Valid Cuis 0 ................................= ..................File Line 6900000 Valid Cuis 0 .............= .....................................File Line 7000000 Valid C= uis 0 ..................................................File Line 7100000 Valid Cuis 0 ..................................................= File Line 7200000 Valid Cuis 0 ...............................= ...................File Line 7300000 Valid Cuis 0 ............= ......................................File Line 7400000 Valid = Cuis 0 ..................................................File Line 7500000<= tel:7500000> Valid Cuis 0 .................................................= .File Line 7600000 Valid Cuis 0 ..............................= ....................File Line 7700000 Valid Cuis 0 ...........= .......................................File Line 7800000 Valid= Cuis 0 ..................................................File Line 7900000= Valid Cuis 0 ................................................= ..File Line 8000000 Valid Cuis 0 .............................= .....................File Line 8100000 Valid Cuis 0 ..........= ........................................File Line 8200000 Vali= d Cuis 0 ..................................................File Line 830000= 0 Valid Cuis 0 ...............................................= ...File Line 8400000 Valid Cuis 0 ............................= ......................File Line 8500000 Valid Cuis 0 .........= .........................................File Line 8600000 Val= id Cuis 0 ..................................................File Line 87000= 00 Valid Cuis 0 ..............................................= ....File Line 8800000 Valid Cuis 0 .............File Lines 882= 7152 Valid Cuis 0 Compiling map of Umls Cuis and Texts .......= ...........................................File Line 100000 Terms 0 .......= ...........................................File Line 200000 Terms 0 .......= ...........................................File Line 300000 Terms 0 .......= ...........................................File Line 400000 Terms 0 .......= ...........................................File Line 500000 Terms 0 .......= ...........................................File Line 600000 Terms 0 .......= ...........................................File Line 700000 Terms 0 .......= ...........................................File Line 800000 Terms 0 .......= ...........................................File Line 900000 Terms 0 .......= ...........................................File Line 1000000 T= erms 0 ..................................................File Line 1100000<= tel:1100000> Terms 0 ..................................................File= Line 1200000 Terms 0 ........................................= ..........File Line 1300000 Terms 0 ..........................= ........................File Line 1400000 Terms 0 ............= ......................................File Line 1500000 Terms = 0 ..................................................File Line 1600000 Terms 0 ..................................................File Line= 1700000 Terms 0 .............................................= .....File Line 1800000 Terms 0 ...............................= ...................File Line 1900000 Terms 0 .................= .................................File Line 2000000 Terms 0 ...= ...............................................File Line 2100000 Terms 0 ..................................................File Line 2200= 000 Terms 0 ..................................................= File Line 2300000 Terms 0 ....................................= ..............File Line 2400000 Terms 0 ......................= ............................File Line 2500000 Terms 0 ........= ..........................................File Line 2600000 Te= rms 0 ..................................................File Line 2700000 Terms 0 ..................................................File = Line 2800000 Terms 0 .........................................= .........File Line 2900000 Terms 0 ...........................= .......................File Line 3000000 Terms 0 .............= .....................................File Line 3100000 Terms 0= ..................................................File Line 3200000 Terms 0 ..................................................File Line = 3300000 Terms 0 ..............................................= ....File Line 3400000 Terms 0 ................................= ..................File Line 3500000 Terms 0 ..................= ................................File Line 3600000 Terms 0 ....= ..............................................File Line 3700000 Terms 0 ..................................................File Line 38000= 00 Terms 0 ..................................................F= ile Line 3900000 Terms 0 .....................................= .............File Line 4000000 Terms 0 .......................= ...........................File Line 4100000 Terms 0 .........= .........................................File Line 4200000 Ter= ms 0 ..................................................File Line 4300000 Terms 0 ..................................................File L= ine 4400000 Terms 0 ..........................................= ........File Line 4500000 Terms 0 ............................= ......................File Line 4600000 Terms 0 ..............= ....................................File Line 4700000 Terms 0 = ..................................................File Line 4800000 Terms 0 ..................................................File Line 4= 900000 Terms 0 ...............................................= ...File Line 5000000 Terms 0 .................................= .................File Line 5100000 Terms 0 ...................= ...............................File Line 5200000 Terms 0 .....= .............................................File Line 5300000= Terms 0 ..................................................File Line 540000= 0 Terms 0 ..................................................Fi= le Line 5500000 Terms 0 ......................................= ............File Line 5600000 Terms 0 ........................= ..........................File Line 5700000 Terms 0 ..........= ........................................File Line 5800000 Term= s 0 ..................................................File Line 5900000 Terms 0 ..................................................File Li= ne 6000000 Terms 0 ...........................................= .......File Line 6100000 Terms 0 .............................= .....................File Line 6200000 Terms 0 ...............= ...................................File Line 6300000 Terms 0 .= .................................................File Line 6400000 Terms 0 ..................................................File Line 65= 00000 Terms 0 ................................................= ..File Line 6600000 Terms 0 ..................................= ................File Line 6700000 Terms 0 ....................= ..............................File Line 6800000 Terms 0 ......= ............................................File Line 6900000 = Terms 0 ..................................................File Line 7000000= Terms 0 ..................................................Fil= e Line 7100000 Terms 0 .......................................= ...........File Line 7200000 Terms 0 .........................= .........................File Line 7300000 Terms 0 ...........= .......................................File Line 7400000 Terms= 0 ..................................................File Line 7500000 Terms 0 ..................................................File Lin= e 7600000 Terms 0 ............................................= ......File Line 7700000 Terms 0 ..............................= ....................File Line 7800000 Terms 0 ................= ..................................File Line 7900000 Terms 0 ..= ................................................File Line 8000000 Terms 0 ..................................................File Line 810= 0000 Terms 0 .................................................= .File Line 8200000 Terms 0 ...................................= ...............File Line 8300000 Terms 0 .....................= .............................File Line 8400000 Terms 0 .......= ...........................................File Line 8500000 T= erms 0 ..................................................File Line 8600000<= tel:8600000> Terms 0 ..................................................File= Line 8700000 Terms 0 ........................................= ..........File Line 8800000 Terms 0 .............File Line 882= 7152 Terms 0 Writing map of Cuis and Texts to pathtoUmls2015.b= sv -----Original Message----- From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] Sent: Wednesday, September 16, 2015 4:00 PM To: dev@ctakes.apache.org Subject: RE: Fast Dictionary Update Thank you! I believe that was a change post 2011! You should actually be ok= with both SNOMEDCT and SNOMEDCT_US in CtakesSources.txt Cheers, Sean -----Original Message----- From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com] Sent: Wednesday, September 16, 2015 3:43 PM To: dev@ctakes.apache.org Subject: Re: Fast Dictionary Update If this can helps, I had to replace 'SNOMEDCT' with 'SNOMEDCT_US' in Ctakes= Sources.txt. On Wed, Sep 16, 2015 at 2:33 PM, Finan, Sean < Sean.Finan@childrens.harvard= .edu> wrote: > I'm not sure that I understand your question. As I sent it, the anat,=20 > snomed and rxnorm are not separate runs. The args line I sent earlier=20 > is for a single run that will create a dictionary with snomed and=20 > rxnorm terms. The anatomy tui list has a special use in correctly=20 > processing snomed codes. > > -----Original Message----- > From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] > Sent: Wednesday, September 16, 2015 3:27 PM > To: dev@ctakes.apache.org > Subject: RE: Fast Dictionary Update > > Ok, hopefully one last question. > > Based on your example everything runs, however the Anat and Snomed=20 > runs don't produce any valid CUIs but RXNorm does. I'm not sure if=20 > this has anything to do with it but every UMLS source read is against MRS= TY. > > Here's my command > > java -cp dictionarytool.jar;lib/* > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls=20 > /path/to/UMLS/META -fd ./data/tiny -atui=20 > ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt=20 > -ol path o ileUmls2015.bsv > > Any suggestions? > > Thanks again, > Brandon > > > -----Original Message----- > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] > Sent: Wednesday, September 16, 2015 3:05 PM > To: dev@ctakes.apache.org > Subject: RE: Fast Dictionary Update > > Yes, that will make the rare word dictionary in a memory-based hsql=20 > database - the same as the default for the dictionary-lookup-fast module. > > -----Original Message----- > From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] > Sent: Wednesday, September 16, 2015 2:42 PM > To: dev@ctakes.apache.org > Subject: RE: Fast Dictionary Update > > Thanks Sean, much appreciated. To clarify the example below would=20 > create the dictionary for use for the rare word approach? > > Thanks, > Brandon > > -----Original Message----- > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] > Sent: Wednesday, September 16, 2015 2:16 PM > To: dev@ctakes.apache.org > Subject: RE: Fast Dictionary Update > > Hi Brandon, > > I just checked in a bin/dictionarytool.zip It should have everything=20 > that you need (.jar, lib/, data/). > java -cp dictionarytool.jar;lib/* > org.apache.ctakes.dictionarytool.DictionaryCreator2 [args] Should do=20 > the trick. > > To recreate a 2015 version of the current ctakes dictionary, the=20 > arguments > are: > -umls my/path/to/2015AA/META -fd ./data/tiny -atui=20 > ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt=20 > -db > jdbc:hsqldb:file:my/path/to/snorx2015 -tbl CUI_TERMS > > Create my/path/to/snorx2015 by copying=20 > resources/memdbtemplate/ctakesumls.properties to=20 > my/path/to/snorx2015.properties - there is a resources/README about this. > > Before populating a DB, I usually do a trial run first, writing to a=20 > flat file. Replace "-db ... -tbl ..." with "-ol my/path/to/testout.bsv" > > > Sean > > -----Original Message----- > From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] > Sent: Wednesday, September 16, 2015 1:49 PM > To: dev@ctakes.apache.org > Subject: RE: Fast Dictionary Update > > Hi Sean, > > That'd be great. > > I think I'm building it incorrectly because after I build the jar and=20 > try to run specifying DictionaryCreator2 as the main class it says it=20 > can't find it. I'm not too familiar with Java and building=20 > projects/jars so it could be my ignorance causing the problem. > > Thanks, > Brandon > > -----Original Message----- > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] > Sent: Wednesday, September 16, 2015 1:45 PM > To: dev@ctakes.apache.org > Subject: RE: Fast Dictionary Update > > Hi Brandon, > > I can send you a jar or commit one pre-built. What goes wrong when you=20 > try to build the tool? > > Sean > > -----Original Message----- > From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] > Sent: Wednesday, September 16, 2015 1:23 PM > To: 'dev@ctakes.apache.org' > Subject: Fast Dictionary Update > > Does someone have the DictionaryTool jar available? I'm having trouble=20 > creating the jar file from the project and would like to be able to=20 > create an updated UMLS fast dictionary for 2015. > > Thanks, > Brandon > > > IMPORTANT WARNING: The information in this message (and the documents=20 > attached to it, if any) is confidential and may be legally privileged. > It is intended solely for the addressee. Access to this message by=20 > anyone else is unauthorized. If you are not the intended recipient,=20 > any disclosure, copying, distribution or any action taken, or omitted=20 > to be taken, in reliance on it is prohibited and may be unlawful. If=20 > you have received this message in error, please delete all electronic=20 > copies of this message (and the documents attached to it, if any),=20 > destroy any hard copies you may have created and notify me immediately by= replying to this email. Thank you. > > Geisinger Health System utilizes an encryption process to safeguard=20 > Protected Health Information and other confidential data contained in=20 > external e-mail messages. If email is encrypted, the recipient will=20 > receive an e-mail instructing them to sign on to the Geisinger Health=20 > System Secure E-mail Message Center to retrieve the encrypted e-mail. > > > >