Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E27418C8E for ; Wed, 9 Dec 2015 05:11:58 +0000 (UTC) Received: (qmail 46136 invoked by uid 500); 9 Dec 2015 05:11:58 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 46068 invoked by uid 500); 9 Dec 2015 05:11:58 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 46056 invoked by uid 99); 9 Dec 2015 05:11:58 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Dec 2015 05:11:58 +0000 Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 079671A0048 for ; Wed, 9 Dec 2015 05:11:58 +0000 (UTC) Received: by igvg19 with SMTP id g19so116838506igv.1 for ; Tue, 08 Dec 2015 21:11:57 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.50.79.131 with SMTP id j3mr26545411igx.49.1449637917658; Tue, 08 Dec 2015 21:11:57 -0800 (PST) Received: by 10.50.214.74 with HTTP; Tue, 8 Dec 2015 21:11:57 -0800 (PST) In-Reply-To: <7d4a6b946e954659a53a92b0aebb267c@LOFEXMBX207W12V.geisinger.edu> References: <7d4a6b946e954659a53a92b0aebb267c@LOFEXMBX207W12V.geisinger.edu> Date: Wed, 9 Dec 2015 00:11:57 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: ctakes with icd10; 2015 versions available on sourceforge! From: Pei Chen To: "dev@ctakes.apache.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Brandon, That sounds great! Please open a Jira ticket for any contributions (anyone should be able to create a Jira account). There are some legal items built into the ASF Jira attachments for accepting contributions/donations. It will also credit the contributors with the merit appropriately. Anyone who is interested can follow the Jira item. (Even better if contributions were open discussion/open development.) --Pei On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D. wrote: > I'd be interested in contributing to making the dictionary tool more user= friendly with a GUI. > > Thanks, > Brandon > > -----Original Message----- > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] > Sent: Tuesday, December 08, 2015 6:12 PM > To: dev@ctakes.apache.org > Subject: RE: ctakes with icd10; 2015 versions available on sourceforge! > > Hi Dave, > > I'm always happy to see interest in our stuff! > >>Step 1 > I built the tool to be able to build a dictionary using anything in the u= mls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't b= e a problem. You just add it to the CtakesSources file (or create an alter= nate file and point to it with -src). To answer another of your questions,= there can be zero or more sources - you saw snomedct and snomedct_us (each= valid in a different umls version). > It also can include any semantic type, just add (or remove) the appropria= te tuis in a different data file. > >>Step 2 > You have it right - you copy the templates to another location and output= to that location. Otherwise you 'lose' your templates. > >>Step 3 and 4 > The jar is built from source. I need to (soon) check in updates to the s= ource, and at the same time I can check in a default prebuilt .jar The lib= / directory is in the source repository. > > Various people have toyed with the idea of putting the tool into a ctakes= module, putting it into an "installation package", making a gui ... The b= est option (imo) is probably to make an easy to use gui and keep a pre-buil= t version in sandbox. Someday, after the rainbow, maybe I'll get a chance = to do that ... > > Sean > > > -----Original Message----- > From: David Kincaid [mailto:kincaid.dave@gmail.com] > Sent: Tuesday, December 08, 2015 4:57 PM > To: dev@ctakes.apache.org > Subject: Re: ctakes with icd10; 2015 versions available on sourceforge! > > Thanks, Sean! It's great that cTAKES may soon have an up to date database= out of the box. Hopefully it will cut down on the need for many to build t= heir own DB's. Thank you much for doing that. > > Unfortunately, I still will need to build a custom one for us. I work in = veterinary medicine so I need to add in the veterinary extension for SNOMED= -CT into the database. > > I looked over the steps below that Brandon included and have some questio= ns: > > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to= "SNOMEDCT_US". The file that I have has two lines in it. First line is SNO= MED, second line is SNOMEDCT_US. So this step doesn't really make sense. > > step 2 should reference the two scripts as being in resource/memdbtemplat= e so others don't have to search for them. Not sure what it means to move t= hem to "location to put new UMLS DB". Does that mean move them into a new d= irectory where the newly created UMLS DB will get written? > > steps 3 and 4 for running the tools reference dictionarytool.jar which do= esn't exist. Does one need to build that somehow from the source before run= ning it? The command line also adds "lib/*" to the classpath. Is that the l= ib directory inside the dictionarytool source code or some other location? > > What else would I need to do to include the SNOMED-CT Veterinary Extensio= n along with the snomedct and rxnorm sources? > > I'll probably not have time to try this out for a while yet, but when I d= o I'd be happy to write up an easy to follow tutorial for building a custom= dictionary assuming I am able to get it to work. > > Has anyone considered making this tool available outside of the source co= de itself? Like including it in the main cTAKES release? It seems there is = demand for it. > > - Dave > > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < Sean.Finan@childrens.harvar= d.edu> wrote: > >> Hi Brandon, thanks for finding and forwarding the instructions! >> >> I have checked in two new hsqldb dictionaries, both from the 2015AB >> version of the UMLS. They both have codes for snomedct_us, rxnorm, >> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mapp= ings. >> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis >> filtered by icd9 and icd10. >> What this means: Cuis that exist for a [filter source] are added to >> the dictionary, as are all text variations from all sources that >> contain that cui. Both dictionaries also use the standard ctakes >> semantic group tui filters. >> >> The names are ctakessnorx2015 and ctakesicd2015 >> >> The snomed rxnorm : >> >> https://urldefense.proofpoint.com/v2/url?u=3Dhttp-3A__sourceforge.net_p_ >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l >> ookup_fast_ctakessnorx2015_&d=3DBQIBaQ&c=3DqS4goWBT7poplM69zy_3xhKwEW14J= ZM >> SdioCoppxeFU&r=3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=3DSRqwsl3= Fm >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=3DkWCcj3-hcqYWZXIPhsERggDLCO-5gppCR >> oS1Gav7r2A&e=3D >> >> The snomed rxnorm icd9 icd10: >> >> https://urldefense.proofpoint.com/v2/url?u=3Dhttp-3A__sourceforge.net_p_ >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l >> ookup_fast_ctakesicd2015_&d=3DBQIBaQ&c=3DqS4goWBT7poplM69zy_3xhKwEW14JZM= Sd >> ioCoppxeFU&r=3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=3DSRqwsl3Fm= uU >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=3DRZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw >> w7EdYgKA&e=3D >> >> The svn root for the whole ugly thing is: >> svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk >> >> Stats: >> ctakessnorx2015 >> 545,913 Terms >> 229,251 Concepts (Cuis) >> 272,987 Snomed codes >> 32,419 Rxnorm codes >> 11,321 icd9 codes >> 61 icd10 codes >> >> Ctakesicd2015 >> 611,230 Terms >> 282,211 Concepts >> 18,626 icd9 codes >> 45,818 icd10 codes >> Snomed and Rxnorm counts are the same >> >> So, adding the icd filters gave us an extra ~53,000 concepts and >> ~65,000 terms. >> >> I would like to move this all to a better root (not >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to >> write directly in trunk (??) and need to get moving on to other things. >> >> There is help on the ctakes wiki: >> https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__cwiki.apache.org_ >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo >> kup&d=3DBQIBaQ&c=3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=3Dfs67G= vlGZ >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=3DSRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53 >> DNhukcaW6c&s=3D98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=3D >> Though I should probably add a few items ... >> >> >> Sean >> >> >> -----Original Message----- >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] >> Sent: Tuesday, December 08, 2015 12:51 PM >> To: dev@ctakes.apache.org >> Subject: RE: ctakes with icd10 >> >> Not to perpetuate the instructions again but I sent these out not long >> ago when I was going through the process and Sean was helping me. >> >> 1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to >> "SNOMEDCT_US" >> 2. Copy ctakesumls.properties and ctakesumls.script from >> memdbtemplate to location to put new UMLS DB >> 3. Run DictionaryCreator2 >> java -cp dictionarytool.jar;lib/* >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS >> 4. Run CodeMapCreator >> java -cp dictionarytool.jar;lib/* >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META" >> -atui ./data/tiny/CtakesAnatTuis.txt -db >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS >> 5. Copy new DB files to new location and create a copy of >> cTakesHsql.xml and update dictionary location >> >> Thanks, >> Brandon >> >> -----Original Message----- >> From: David Kincaid [mailto:kincaid.dave@gmail.com] >> Sent: Tuesday, December 08, 2015 12:47 PM >> To: dev@ctakes.apache.org >> Subject: Re: ctakes with icd10 >> >> This seems like a pretty common request and with such an old version >> of UMLS database shipped with cTAKES it's only going to get worse. >> I've been wanting to build a dictionary using the latest UMLS release >> (as well as a custom database), so would be happy to write up the >> steps as I go through it. That assumes that I can dig up the instruction= s in the dev list. >> >> - Dave >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < >> Sean.Finan@childrens.harvard.edu> wrote: >> >> > Hi Alaa, >> > >> > The -shortest- answer is that you'll need to run the dictionary >> > creation tool. There are instructions in older devlist threads. By >> > default the dictionary creation tool does add icd9 and icd10 tables >> > to >> the dictionary. >> > The problem is that in Umls 2011AB those codes weren't very well >> > populated. The 2015AB icd# set is much more rich so those tables >> > should be pretty good. Then in ctakes you would look up annotations >> > by icd9 or icd10 codes instead of by cui: >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code >> > ); >> > >> > Sean >> > >> > -----Original Message----- >> > From: Savova, Guergana >> > [mailto:Guergana.Savova@childrens.harvard.edu] >> > Sent: Tuesday, December 08, 2015 12:17 PM >> > To: dev@ctakes.apache.org >> > Subject: RE: ctakes with icd10 >> > >> > Hi Alaa, >> > You need to create a resource off the terminology/ontology you want >> > to use (in this case ICD9 or ICD10). Then run that resource with >> > cTAKES for the fast dictionary lookup. There is cTAKES code and some >> > documentation on how to create that resource. By default, cTAKES >> > runs with a resource created from the English version of SNOMED CT and= RxNORM. >> > Hope this helps. >> > --Guergana >> > >> > -----Original Message----- >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com] >> > Sent: Tuesday, December 8, 2015 10:01 AM >> > To: dev@ctakes.apache.org >> > Subject: ctakes with icd10 >> > >> > Hi, >> > >> > I downloaded Latest umls version, and I want to know how to make >> > ctakes work with icd10 and icd9. >> > >> > >> > Thanks >> > >> >> >> IMPORTANT WARNING: The information in this message (and the documents >> attached to it, if any) is confidential and may be legally privileged. >> It is intended solely for the addressee. Access to this message by >> anyone else is unauthorized. If you are not the intended recipient, >> any disclosure, copying, distribution or any action taken, or omitted >> to be taken, in reliance on it is prohibited and may be unlawful. If >> you have received this message in error, please delete all electronic >> copies of this message (and the documents attached to it, if any), >> destroy any hard copies you may have created and notify me immediately b= y replying to this email. Thank you. >> >> Geisinger Health System utilizes an encryption process to safeguard >> Protected Health Information and other confidential data contained in >> external e-mail messages. If email is encrypted, the recipient will >> receive an e-mail instructing them to sign on to the Geisinger Health >> System Secure E-mail Message Center to retrieve the encrypted e-mail. >>