ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alaa al Barari <alaa.albar...@gmail.com>
Subject Re: ctakes with icd10; 2015 versions available on sourceforge!
Date Wed, 09 Dec 2015 23:07:16 GMT
Thanks Finan and Brandon, your help is appreciated a lot.

I downloaded the dictionary tool from
https://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/bin/dictionarytool.zip
I hope its the latest and bug free.


my running command is : java -cp ./dictionarytool.jar:lib/*
org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
/home/abarari/Desktop/umls/2015AB/META/ -atui
./data/optional/CtakesAnatTuis.txt -db
jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd2015
-tbl CUI_TERMS -df ./data/optional/ -src ./data/small/ConversionSources.txt
-tui ./data/optional/CtakesAllTuis.txt



I am running on ubuntu by the way ... anyway under
/home/abarari/Desktop/dictionarytool/output/

there is only

 abarari@ubuntu:~/Desktop/dictionarytool/output$ ls
ctakesicd2015.log  ctakesicd2015.properties  ctakesicd2015.script


where is the database ? am I doing something wrong ? do I need to create
the database before executing the dictionarytool or what ?


I found couple of issues in the dictionary tool, it does not work well with
relative paths.


On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <chenpei@apache.org> wrote:

> Brandon,
> That sounds great!
> Please open a Jira ticket for any contributions (anyone should be able
> to create a Jira account).  There are some legal items built into the
> ASF Jira attachments for accepting contributions/donations.
> It will also credit the contributors with the merit appropriately.
> Anyone who is interested can follow the Jira item. (Even better if
> contributions were open discussion/open development.)
> --Pei
>
> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
> <bdgeise@geisinger.edu> wrote:
> > I'd be interested in contributing to making the dictionary tool more
> user friendly with a GUI.
> >
> > Thanks,
> > Brandon
> >
> > -----Original Message-----
> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 6:12 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
> >
> > Hi Dave,
> >
> > I'm always happy to see interest in our stuff!
> >
> >>Step 1
> > I built the tool to be able to build a dictionary using anything in the
> umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't
> be a problem.  You just add it to the CtakesSources file (or create an
> alternate file and point to it with -src).  To answer another of your
> questions, there can be zero or more sources - you saw snomedct and
> snomedct_us (each valid in a different umls version).
> > It also can include any semantic type, just add (or remove) the
> appropriate tuis in a different data file.
> >
> >>Step 2
> > You have it right - you copy the templates to another location and
> output to that location.  Otherwise you 'lose' your templates.
> >
> >>Step 3 and 4
> > The jar is built from source.  I need to (soon) check in updates to the
> source, and at the same time I can check in a default prebuilt .jar  The
> lib/ directory is in the source repository.
> >
> > Various people have toyed with the idea of putting the tool into a
> ctakes module, putting it into an "installation package", making a gui ...
> The best option (imo) is probably to make an easy to use gui and keep a
> pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a
> chance to do that ...
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> > Sent: Tuesday, December 08, 2015 4:57 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
> >
> > Thanks, Sean! It's great that cTAKES may soon have an up to date
> database out of the box. Hopefully it will cut down on the need for many to
> build their own DB's. Thank you much for doing that.
> >
> > Unfortunately, I still will need to build a custom one for us. I work in
> veterinary medicine so I need to add in the veterinary extension for
> SNOMED-CT into the database.
> >
> > I looked over the steps below that Brandon included and have some
> questions:
> >
> > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT"
> to "SNOMEDCT_US". The file that I have has two lines in it. First line is
> SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
> >
> > step 2 should reference the two scripts as being in
> resource/memdbtemplate so others don't have to search for them. Not sure
> what it means to move them to "location to put new UMLS DB". Does that mean
> move them into a new directory where the newly created UMLS DB will get
> written?
> >
> > steps 3 and 4 for running the tools reference dictionarytool.jar which
> doesn't exist. Does one need to build that somehow from the source before
> running it? The command line also adds "lib/*" to the classpath. Is that
> the lib directory inside the dictionarytool source code or some other
> location?
> >
> > What else would I need to do to include the SNOMED-CT Veterinary
> Extension along with the snomedct and rxnorm sources?
> >
> > I'll probably not have time to try this out for a while yet, but when I
> do I'd be happy to write up an easy to follow tutorial for building a
> custom dictionary assuming I am able to get it to work.
> >
> > Has anyone considered making this tool available outside of the source
> code itself? Like including it in the main cTAKES release? It seems there
> is demand for it.
> >
> > - Dave
> >
> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
> >
> >> Hi Brandon, thanks for finding and forwarding the instructions!
> >>
> >> I have checked in two new hsqldb dictionaries, both from the 2015AB
> >> version of the UMLS.  They both have codes for snomedct_us, rxnorm,
> >> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term
> mappings.
> >>
> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis
> >> filtered by icd9 and icd10.
> >> What this means:  Cuis that exist for a [filter source] are added to
> >> the dictionary, as are all text variations from all sources that
> >> contain that cui.  Both dictionaries also use the standard ctakes
> >> semantic group tui filters.
> >>
> >> The names are ctakessnorx2015 and ctakesicd2015
> >>
> >> The snomed rxnorm :
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> >> oS1Gav7r2A&e=
> >>
> >> The snomed rxnorm icd9 icd10:
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> >> w7EdYgKA&e=
> >>
> >> The svn root for the whole ugly thing is:
> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >>
> >> Stats:
> >> ctakessnorx2015
> >> 545,913 Terms
> >> 229,251 Concepts (Cuis)
> >> 272,987 Snomed codes
> >> 32,419 Rxnorm codes
> >> 11,321 icd9 codes
> >> 61 icd10 codes
> >>
> >> Ctakesicd2015
> >> 611,230 Terms
> >> 282,211 Concepts
> >> 18,626 icd9 codes
> >> 45,818 icd10 codes
> >> Snomed and Rxnorm counts are the same
> >>
> >> So, adding the icd filters gave us an extra ~53,000 concepts and
> >> ~65,000 terms.
> >>
> >> I would like to move this all to a better root (not
> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
> >> write directly in trunk (??) and need to get moving on to other things.
> >>
> >> There is help on the ctakes wiki:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> >> Though I should probably add a few items ...
> >>
> >>
> >> Sean
> >>
> >>
> >> -----Original Message-----
> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> >> Sent: Tuesday, December 08, 2015 12:51 PM
> >> To: dev@ctakes.apache.org
> >> Subject: RE: ctakes with icd10
> >>
> >> Not to perpetuate the instructions again but I sent these out not long
> >> ago when I was going through the process and Sean was helping me.
> >>
> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> >> "SNOMEDCT_US"
> >>         2. Copy ctakesumls.properties and ctakesumls.script from
> >> memdbtemplate to location to put new UMLS DB
> >>         3. Run DictionaryCreator2
> >>         java -cp dictionarytool.jar;lib/*
> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>         4. Run CodeMapCreator
> >>         java -cp dictionarytool.jar;lib/*
> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>         5. Copy new DB files to new location and create a copy of
> >> cTakesHsql.xml and update dictionary location
> >>
> >> Thanks,
> >> Brandon
> >>
> >> -----Original Message-----
> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >> Sent: Tuesday, December 08, 2015 12:47 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: ctakes with icd10
> >>
> >> This seems like a pretty common request and with such an old version
> >> of UMLS database shipped with cTAKES it's only going to get worse.
> >> I've been wanting to build a dictionary using the latest UMLS release
> >> (as well as a custom database), so would be happy to write up the
> >> steps as I go through it. That assumes that I can dig up the
> instructions in the dev list.
> >>
> >> - Dave
> >>
> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>
> >> > Hi Alaa,
> >> >
> >> > The -shortest- answer is that you'll need to run the dictionary
> >> > creation tool.  There are instructions in older devlist threads.  By
> >> > default the dictionary creation tool does add icd9 and icd10 tables
> >> > to
> >> the dictionary.
> >> > The problem is that in Umls 2011AB those codes weren't very well
> >> > populated.  The 2015AB icd# set is much more rich so those tables
> >> > should be pretty good.  Then in ctakes you would look up annotations
> >> > by icd9 or icd10 codes instead of by cui:
> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code
> >> > );
> >> >
> >> > Sean
> >> >
> >> > -----Original Message-----
> >> > From: Savova, Guergana
> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
> >> > Sent: Tuesday, December 08, 2015 12:17 PM
> >> > To: dev@ctakes.apache.org
> >> > Subject: RE: ctakes with icd10
> >> >
> >> > Hi Alaa,
> >> > You need to create a resource off the terminology/ontology you want
> >> > to use (in this case ICD9 or ICD10). Then run that resource with
> >> > cTAKES for the fast dictionary lookup. There is cTAKES code and some
> >> > documentation on how to create that resource. By default, cTAKES
> >> > runs with a resource created from the English version of SNOMED CT
> and RxNORM.
> >> > Hope this helps.
> >> > --Guergana
> >> >
> >> > -----Original Message-----
> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >> > Sent: Tuesday, December 8, 2015 10:01 AM
> >> > To: dev@ctakes.apache.org
> >> > Subject: ctakes with icd10
> >> >
> >> > Hi,
> >> >
> >> > I downloaded Latest umls version, and I want to know how to make
> >> > ctakes work with icd10 and icd9.
> >> >
> >> >
> >> > Thanks
> >> >
> >>
> >>
> >> IMPORTANT WARNING: The information in this message (and the documents
> >> attached to it, if any) is confidential and may be legally privileged.
> >> It is intended solely for the addressee. Access to this message by
> >> anyone else is unauthorized. If you are not the intended recipient,
> >> any disclosure, copying, distribution or any action taken, or omitted
> >> to be taken, in reliance on it is prohibited and may be unlawful. If
> >> you have received this message in error, please delete all electronic
> >> copies of this message (and the documents attached to it, if any),
> >> destroy any hard copies you may have created and notify me immediately
> by replying to this email. Thank you.
> >>
> >> Geisinger Health System utilizes an encryption process to safeguard
> >> Protected Health Information and other confidential data contained in
> >> external e-mail messages. If email is encrypted, the recipient will
> >> receive an e-mail instructing them to sign on to the Geisinger Health
> >> System Secure E-mail Message Center to retrieve the encrypted e-mail.
> >>
>



-- 
Eng Alaa Al-Barari
phone 0599297470

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message