ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alaa al Barari <alaa.albar...@gmail.com>
Subject Re: ctakes with icd10; 2015 versions available on sourceforge!
Date Thu, 10 Dec 2015 15:37:58 GMT
Hi Finan,

I am sorry if I am asking too much but I am really stuck ...

1- could you please give me a link where I can download the latest version
of dictionarytool
2- The current version I have always produce for icd10pcs although I have
in the -src file icd10CM,  icd10pcs is statically added inside
dictionarytool ?  if I changed from within the code it should work ?
3- after running the tool lines like below are added to the .script file am
i on the right track ?
INSERT INTO CUI_TERMS VALUES(20417,1,2,'hyoid bones','bones')
INSERT INTO CUI_TERMS VALUES(20417,0,2,'os hyoideum','os')

4- as naive as this sound but what is tui insides CtakesAnatTuis.txt?

5- any documentation you advice to read ?


On Thu, Dec 10, 2015 at 10:37 AM, Alaa al Barari <alaa.albarari@gmail.com>
wrote:

> Finan, from where to download the 2015. properties from sourceforg. those
> all ICDs and snowmed ?
>
> I prefer to learn how to generate my own db because I will need to create
> my own later on, so your help is appreciated.
>
> On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari <alaa.albarari@gmail.com>
> wrote:
>
>> Thank, but what I endup with is
>> wrong ?
>> On Dec 10, 2015 4:26 AM, "Finan, Sean" <Sean.Finan@childrens.harvard.edu>
>> wrote:
>>
>>> Hi Alaa,
>>>
>>> If you downloaded the 2015 .property and .script files then you do not
>>> need to run the dictionary creation tool.  Those databases are already
>>> populated and ready to use.
>>>
>>> Sean
>>>
>>>
>>> -----Original Message-----
>>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>>> Sent: Wednesday, December 09, 2015 6:33 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>>>
>>> so basically looks like the path had Desktop as capital thats why it did
>>> not work.
>>>
>>> I ended up having rows like this inside ctakesicd2015.scripts :
>>>
>>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg /
>>> 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
>>> VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
>>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
>>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
>>> injection','magnesium')
>>>
>>>
>>> does this mean it worked ?
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari <alaa.albarari@gmail.com
>>> >
>>> wrote:
>>>
>>> > Thanks Finan and Brandon, your help is appreciated a lot.
>>> >
>>> > I downloaded the dictionary tool from
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_re
>>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQIBaQ&
>>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYm
>>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhMM&s=
>>> > JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
>>> > I hope its the latest and bug free.
>>> >
>>> >
>>> > my running command is : java -cp ./dictionarytool.jar:lib/*
>>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>>> > /home/abarari/Desktop/umls/2015AB/META/ -atui
>>> > ./data/optional/CtakesAnatTuis.txt -db
>>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd
>>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src
>>> > ./data/small/ConversionSources.txt
>>> > -tui ./data/optional/CtakesAllTuis.txt
>>> >
>>> >
>>> >
>>> > I am running on ubuntu by the way ... anyway under
>>> > /home/abarari/Desktop/dictionarytool/output/
>>> >
>>> > there is only
>>> >
>>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls ctakesicd2015.log
>>> > ctakesicd2015.properties  ctakesicd2015.script
>>> >
>>> >
>>> > where is the database ? am I doing something wrong ? do I need to
>>> > create the database before executing the dictionarytool or what ?
>>> >
>>> >
>>> > I found couple of issues in the dictionary tool, it does not work well
>>> > with relative paths.
>>> >
>>> >
>>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <chenpei@apache.org> wrote:
>>> >
>>> >> Brandon,
>>> >> That sounds great!
>>> >> Please open a Jira ticket for any contributions (anyone should be
>>> >> able to create a Jira account).  There are some legal items built
>>> >> into the ASF Jira attachments for accepting contributions/donations.
>>> >> It will also credit the contributors with the merit appropriately.
>>> >> Anyone who is interested can follow the Jira item. (Even better if
>>> >> contributions were open discussion/open development.) --Pei
>>> >>
>>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>>> >> <bdgeise@geisinger.edu> wrote:
>>> >> > I'd be interested in contributing to making the dictionary tool
>>> >> > more
>>> >> user friendly with a GUI.
>>> >> >
>>> >> > Thanks,
>>> >> > Brandon
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
>>> >> > To: dev@ctakes.apache.org
>>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
>>> sourceforge!
>>> >> >
>>> >> > Hi Dave,
>>> >> >
>>> >> > I'm always happy to see interest in our stuff!
>>> >> >
>>> >> >>Step 1
>>> >> > I built the tool to be able to build a dictionary using anything
in
>>> >> > the
>>> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension
>>> >> shouldn't be a problem.  You just add it to the CtakesSources file
>>> >> (or create an alternate file and point to it with -src).  To answer
>>> >> another of your questions, there can be zero or more sources - you
>>> >> saw snomedct and snomedct_us (each valid in a different umls version).
>>> >> > It also can include any semantic type, just add (or remove) the
>>> >> appropriate tuis in a different data file.
>>> >> >
>>> >> >>Step 2
>>> >> > You have it right - you copy the templates to another location
and
>>> >> output to that location.  Otherwise you 'lose' your templates.
>>> >> >
>>> >> >>Step 3 and 4
>>> >> > The jar is built from source.  I need to (soon) check in updates
to
>>> >> > the
>>> >> source, and at the same time I can check in a default prebuilt .jar
>>> >> The lib/ directory is in the source repository.
>>> >> >
>>> >> > Various people have toyed with the idea of putting the tool into
a
>>> >> ctakes module, putting it into an "installation package", making a
>>> gui ...
>>> >> The best option (imo) is probably to make an easy to use gui and keep
>>> >> a pre-built version in sandbox.  Someday, after the rainbow, maybe
>>> >> I'll get a chance to do that ...
>>> >> >
>>> >> > Sean
>>> >> >
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
>>> >> > To: dev@ctakes.apache.org
>>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
>>> sourceforge!
>>> >> >
>>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to date
>>> >> database out of the box. Hopefully it will cut down on the need for
>>> >> many to build their own DB's. Thank you much for doing that.
>>> >> >
>>> >> > Unfortunately, I still will need to build a custom one for us.
I
>>> >> > work
>>> >> in veterinary medicine so I need to add in the veterinary extension
>>> >> for SNOMED-CT into the database.
>>> >> >
>>> >> > I looked over the steps below that Brandon included and have some
>>> >> questions:
>>> >> >
>>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
>>> "SNOMEDCT"
>>> >> to "SNOMEDCT_US". The file that I have has two lines in it. First
>>> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't
>>> really make sense.
>>> >> >
>>> >> > step 2 should reference the two scripts as being in
>>> >> resource/memdbtemplate so others don't have to search for them. Not
>>> >> sure what it means to move them to "location to put new UMLS DB".
>>> >> Does that mean move them into a new directory where the newly created
>>> >> UMLS DB will get written?
>>> >> >
>>> >> > steps 3 and 4 for running the tools reference dictionarytool.jar
>>> >> > which
>>> >> doesn't exist. Does one need to build that somehow from the source
>>> >> before running it? The command line also adds "lib/*" to the
>>> >> classpath. Is that the lib directory inside the dictionarytool source
>>> >> code or some other location?
>>> >> >
>>> >> > What else would I need to do to include the SNOMED-CT Veterinary
>>> >> Extension along with the snomedct and rxnorm sources?
>>> >> >
>>> >> > I'll probably not have time to try this out for a while yet, but
>>> >> > when I
>>> >> do I'd be happy to write up an easy to follow tutorial for building
a
>>> >> custom dictionary assuming I am able to get it to work.
>>> >> >
>>> >> > Has anyone considered making this tool available outside of the
>>> >> > source
>>> >> code itself? Like including it in the main cTAKES release? It seems
>>> >> there is demand for it.
>>> >> >
>>> >> > - Dave
>>> >> >
>>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>>> >> Sean.Finan@childrens.harvard.edu> wrote:
>>> >> >
>>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
>>> >> >>
>>> >> >> I have checked in two new hsqldb dictionaries, both from the
>>> >> >> 2015AB version of the UMLS.  They both have codes for snomedct_us,
>>> >> >> rxnorm, icd9cm and icd10pcs - as well as the usual cui, tui,
>>> >> >> preferred term
>>> >> mappings.
>>> >> >>
>>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds
cuis
>>> >> >> filtered by icd9 and icd10.
>>> >> >> What this means:  Cuis that exist for a [filter source] are
added
>>> >> >> to the dictionary, as are all text variations from all sources
>>> >> >> that contain that cui.  Both dictionaries also use the standard
>>> >> >> ctakes semantic group tui filters.
>>> >> >>
>>> >> >> The names are ctakessnorx2015 and ctakesicd2015
>>> >> >>
>>> >> >> The snomed rxnorm :
>>> >> >>
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>>> >> >> t_p_
>>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>>> >> >> Drwo
>>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>>> >> >> ry_l
>>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
>>> >> >> 4JZM
>>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqws
>>> >> >> l3Fm
>>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5g
>>> >> >> ppCR
>>> >> >> oS1Gav7r2A&e=
>>> >> >>
>>> >> >> The snomed rxnorm icd9 icd10:
>>> >> >>
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>>> >> >> t_p_
>>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>>> >> >> Drwo
>>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>>> >> >> ry_l
>>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>>> >> >> ZMSd
>>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3
>>> >> >> FmuU
>>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G
>>> >> >> 39Tw
>>> >> >> w7EdYgKA&e=
>>> >> >>
>>> >> >> The svn root for the whole ugly thing is:
>>> >> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>>> >> >>
>>> >> >> Stats:
>>> >> >> ctakessnorx2015
>>> >> >> 545,913 Terms
>>> >> >> 229,251 Concepts (Cuis)
>>> >> >> 272,987 Snomed codes
>>> >> >> 32,419 Rxnorm codes
>>> >> >> 11,321 icd9 codes
>>> >> >> 61 icd10 codes
>>> >> >>
>>> >> >> Ctakesicd2015
>>> >> >> 611,230 Terms
>>> >> >> 282,211 Concepts
>>> >> >> 18,626 icd9 codes
>>> >> >> 45,818 icd10 codes
>>> >> >> Snomed and Rxnorm counts are the same
>>> >> >>
>>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts
and
>>> >> >> ~65,000 terms.
>>> >> >>
>>> >> >> I would like to move this all to a better root (not
>>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able
to
>>> >> >> write directly in trunk (??) and need to get moving on to other
>>> things.
>>> >> >>
>>> >> >> There is help on the ctakes wiki:
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
>>> >> >> org_
>>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2
>>> >> >> BLoo
>>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
>>> >> >> vlGZ
>>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVR
>>> >> >> kL53 DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>>> >> >> Though I should probably add a few items ...
>>> >> >>
>>> >> >>
>>> >> >> Sean
>>> >> >>
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
>>> >> >> To: dev@ctakes.apache.org
>>> >> >> Subject: RE: ctakes with icd10
>>> >> >>
>>> >> >> Not to perpetuate the instructions again but I sent these out
not
>>> >> >> long ago when I was going through the process and Sean was
helping
>>> me.
>>> >> >>
>>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
>>> >> >> to "SNOMEDCT_US"
>>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script
from
>>> >> >> memdbtemplate to location to put new UMLS DB
>>> >> >>         3. Run DictionaryCreator2
>>> >> >>         java -cp dictionarytool.jar;lib/*
>>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>> >> >>         4. Run CodeMapCreator
>>> >> >>         java -cp dictionarytool.jar;lib/*
>>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>>> >> "\pathToUmls\META"
>>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>> >> >>         5. Copy new DB files to new location and create a copy
of
>>> >> >> cTakesHsql.xml and update dictionary location
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Brandon
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
>>> >> >> To: dev@ctakes.apache.org
>>> >> >> Subject: Re: ctakes with icd10
>>> >> >>
>>> >> >> This seems like a pretty common request and with such an old
>>> >> >> version of UMLS database shipped with cTAKES it's only going
to
>>> get worse.
>>> >> >> I've been wanting to build a dictionary using the latest UMLS
>>> >> >> release (as well as a custom database), so would be happy to
write
>>> >> >> up the steps as I go through it. That assumes that I can dig
up
>>> >> >> the
>>> >> instructions in the dev list.
>>> >> >>
>>> >> >> - Dave
>>> >> >>
>>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
>>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
>>> >> >>
>>> >> >> > Hi Alaa,
>>> >> >> >
>>> >> >> > The -shortest- answer is that you'll need to run the dictionary
>>> >> >> > creation tool.  There are instructions in older devlist
threads.
>>> >> >> > By default the dictionary creation tool does add icd9
and icd10
>>> >> >> > tables to
>>> >> >> the dictionary.
>>> >> >> > The problem is that in Umls 2011AB those codes weren't
very well
>>> >> >> > populated.  The 2015AB icd# set is much more rich so those
>>> >> >> > tables should be pretty good.  Then in ctakes you would
look up
>>> >> >> > annotations by icd9 or icd10 codes instead of by cui:
>>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
>>> >> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode(
jcas,
>>> >> >> > icd#Code );
>>> >> >> >
>>> >> >> > Sean
>>> >> >> >
>>> >> >> > -----Original Message-----
>>> >> >> > From: Savova, Guergana
>>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>>> >> >> > To: dev@ctakes.apache.org
>>> >> >> > Subject: RE: ctakes with icd10
>>> >> >> >
>>> >> >> > Hi Alaa,
>>> >> >> > You need to create a resource off the terminology/ontology
you
>>> >> >> > want to use (in this case ICD9 or ICD10). Then run that
resource
>>> >> >> > with cTAKES for the fast dictionary lookup. There is cTAKES
code
>>> >> >> > and some documentation on how to create that resource.
By
>>> >> >> > default, cTAKES runs with a resource created from the
English
>>> >> >> > version of SNOMED CT
>>> >> and RxNORM.
>>> >> >> > Hope this helps.
>>> >> >> > --Guergana
>>> >> >> >
>>> >> >> > -----Original Message-----
>>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>>> >> >> > To: dev@ctakes.apache.org
>>> >> >> > Subject: ctakes with icd10
>>> >> >> >
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > I downloaded Latest umls version, and I want to know how
to make
>>> >> >> > ctakes work with icd10 and icd9.
>>> >> >> >
>>> >> >> >
>>> >> >> > Thanks
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >> IMPORTANT WARNING: The information in this message (and the
>>> >> >> documents attached to it, if any) is confidential and may be
>>> legally privileged.
>>> >> >> It is intended solely for the addressee. Access to this message
by
>>> >> >> anyone else is unauthorized. If you are not the intended
>>> >> >> recipient, any disclosure, copying, distribution or any action
>>> >> >> taken, or omitted to be taken, in reliance on it is prohibited
and
>>> >> >> may be unlawful. If you have received this message in error,
>>> >> >> please delete all electronic copies of this message (and the
>>> >> >> documents attached to it, if any), destroy any hard copies
you may
>>> >> >> have created and notify me immediately
>>> >> by replying to this email. Thank you.
>>> >> >>
>>> >> >> Geisinger Health System utilizes an encryption process to
>>> >> >> safeguard Protected Health Information and other confidential
data
>>> >> >> contained in external e-mail messages. If email is encrypted,
the
>>> >> >> recipient will receive an e-mail instructing them to sign on
to
>>> >> >> the Geisinger Health System Secure E-mail Message Center to
>>> retrieve the encrypted e-mail.
>>> >> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Eng Alaa Al-Barari
>>> > phone 0599297470
>>> >
>>>
>>>
>>>
>>> --
>>> Eng Alaa Al-Barari
>>> phone 0599297470
>>>
>>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



-- 
Eng Alaa Al-Barari
phone 0599297470

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message