ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: ctakes with icd10; 2015 versions available on sourceforge!
Date Wed, 09 Dec 2015 05:11:57 GMT
Brandon,
That sounds great!
Please open a Jira ticket for any contributions (anyone should be able
to create a Jira account).  There are some legal items built into the
ASF Jira attachments for accepting contributions/donations.
It will also credit the contributors with the merit appropriately.
Anyone who is interested can follow the Jira item. (Even better if
contributions were open discussion/open development.)
--Pei

On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
<bdgeise@geisinger.edu> wrote:
> I'd be interested in contributing to making the dictionary tool more user friendly with
a GUI.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 6:12 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
>
> Hi Dave,
>
> I'm always happy to see interest in our stuff!
>
>>Step 1
> I built the tool to be able to build a dictionary using anything in the umls - snomed,
icd9, hpo, etc. so using the veterinary extension shouldn't be a problem.  You just add it
to the CtakesSources file (or create an alternate file and point to it with -src).  To answer
another of your questions, there can be zero or more sources - you saw snomedct and snomedct_us
(each valid in a different umls version).
> It also can include any semantic type, just add (or remove) the appropriate tuis in a
different data file.
>
>>Step 2
> You have it right - you copy the templates to another location and output to that location.
 Otherwise you 'lose' your templates.
>
>>Step 3 and 4
> The jar is built from source.  I need to (soon) check in updates to the source, and at
the same time I can check in a default prebuilt .jar  The lib/ directory is in the source
repository.
>
> Various people have toyed with the idea of putting the tool into a ctakes module, putting
it into an "installation package", making a gui ...  The best option (imo) is probably to
make an easy to use gui and keep a pre-built version in sandbox.  Someday, after the rainbow,
maybe I'll get a chance to do that ...
>
> Sean
>
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 4:57 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Thanks, Sean! It's great that cTAKES may soon have an up to date database out of the
box. Hopefully it will cut down on the need for many to build their own DB's. Thank you much
for doing that.
>
> Unfortunately, I still will need to build a custom one for us. I work in veterinary medicine
so I need to add in the veterinary extension for SNOMED-CT into the database.
>
> I looked over the steps below that Brandon included and have some questions:
>
> step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US".
The file that I have has two lines in it. First line is SNOMED, second line is SNOMEDCT_US.
So this step doesn't really make sense.
>
> step 2 should reference the two scripts as being in resource/memdbtemplate so others
don't have to search for them. Not sure what it means to move them to "location to put new
UMLS DB". Does that mean move them into a new directory where the newly created UMLS DB will
get written?
>
> steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't exist.
Does one need to build that somehow from the source before running it? The command line also
adds "lib/*" to the classpath. Is that the lib directory inside the dictionarytool source
code or some other location?
>
> What else would I need to do to include the SNOMED-CT Veterinary Extension along with
the snomedct and rxnorm sources?
>
> I'll probably not have time to try this out for a while yet, but when I do I'd be happy
to write up an easy to follow tutorial for building a custom dictionary assuming I am able
to get it to work.
>
> Has anyone considered making this tool available outside of the source code itself? Like
including it in the main cTAKES release? It seems there is demand for it.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu>
wrote:
>
>> Hi Brandon, thanks for finding and forwarding the instructions!
>>
>> I have checked in two new hsqldb dictionaries, both from the 2015AB
>> version of the UMLS.  They both have codes for snomedct_us, rxnorm,
>> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>>
>> One uses cuis filtered by snomed and rxnorm, the other adds cuis
>> filtered by icd9 and icd10.
>> What this means:  Cuis that exist for a [filter source] are added to
>> the dictionary, as are all text variations from all sources that
>> contain that cui.  Both dictionaries also use the standard ctakes
>> semantic group tui filters.
>>
>> The names are ctakessnorx2015 and ctakesicd2015
>>
>> The snomed rxnorm :
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
>> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
>> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
>> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
>> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
>> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
>> oS1Gav7r2A&e=
>>
>> The snomed rxnorm icd9 icd10:
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
>> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
>> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
>> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
>> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
>> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
>> w7EdYgKA&e=
>>
>> The svn root for the whole ugly thing is:
>>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>>
>> Stats:
>> ctakessnorx2015
>> 545,913 Terms
>> 229,251 Concepts (Cuis)
>> 272,987 Snomed codes
>> 32,419 Rxnorm codes
>> 11,321 icd9 codes
>> 61 icd10 codes
>>
>> Ctakesicd2015
>> 611,230 Terms
>> 282,211 Concepts
>> 18,626 icd9 codes
>> 45,818 icd10 codes
>> Snomed and Rxnorm counts are the same
>>
>> So, adding the icd filters gave us an extra ~53,000 concepts and
>> ~65,000 terms.
>>
>> I would like to move this all to a better root (not
>> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
>> write directly in trunk (??) and need to get moving on to other things.
>>
>> There is help on the ctakes wiki:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
>> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
>> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
>> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
>> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>> Though I should probably add a few items ...
>>
>>
>> Sean
>>
>>
>> -----Original Message-----
>> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>> Sent: Tuesday, December 08, 2015 12:51 PM
>> To: dev@ctakes.apache.org
>> Subject: RE: ctakes with icd10
>>
>> Not to perpetuate the instructions again but I sent these out not long
>> ago when I was going through the process and Sean was helping me.
>>
>>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
>> "SNOMEDCT_US"
>>         2. Copy ctakesumls.properties and ctakesumls.script from
>> memdbtemplate to location to put new UMLS DB
>>         3. Run DictionaryCreator2
>>         java -cp dictionarytool.jar;lib/*
>> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>         4. Run CodeMapCreator
>>         java -cp dictionarytool.jar;lib/*
>> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
>> -atui ./data/tiny/CtakesAnatTuis.txt -db
>> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>         5. Copy new DB files to new location and create a copy of
>> cTakesHsql.xml and update dictionary location
>>
>> Thanks,
>> Brandon
>>
>> -----Original Message-----
>> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> Sent: Tuesday, December 08, 2015 12:47 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: ctakes with icd10
>>
>> This seems like a pretty common request and with such an old version
>> of UMLS database shipped with cTAKES it's only going to get worse.
>> I've been wanting to build a dictionary using the latest UMLS release
>> (as well as a custom database), so would be happy to write up the
>> steps as I go through it. That assumes that I can dig up the instructions in the
dev list.
>>
>> - Dave
>>
>> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>>
>> > Hi Alaa,
>> >
>> > The -shortest- answer is that you'll need to run the dictionary
>> > creation tool.  There are instructions in older devlist threads.  By
>> > default the dictionary creation tool does add icd9 and icd10 tables
>> > to
>> the dictionary.
>> > The problem is that in Umls 2011AB those codes weren't very well
>> > populated.  The 2015AB icd# set is much more rich so those tables
>> > should be pretty good.  Then in ctakes you would look up annotations
>> > by icd9 or icd10 codes instead of by cui:
>> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
>> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code
>> > );
>> >
>> > Sean
>> >
>> > -----Original Message-----
>> > From: Savova, Guergana
>> > [mailto:Guergana.Savova@childrens.harvard.edu]
>> > Sent: Tuesday, December 08, 2015 12:17 PM
>> > To: dev@ctakes.apache.org
>> > Subject: RE: ctakes with icd10
>> >
>> > Hi Alaa,
>> > You need to create a resource off the terminology/ontology you want
>> > to use (in this case ICD9 or ICD10). Then run that resource with
>> > cTAKES for the fast dictionary lookup. There is cTAKES code and some
>> > documentation on how to create that resource. By default, cTAKES
>> > runs with a resource created from the English version of SNOMED CT and RxNORM.
>> > Hope this helps.
>> > --Guergana
>> >
>> > -----Original Message-----
>> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> > Sent: Tuesday, December 8, 2015 10:01 AM
>> > To: dev@ctakes.apache.org
>> > Subject: ctakes with icd10
>> >
>> > Hi,
>> >
>> > I downloaded Latest umls version, and I want to know how to make
>> > ctakes work with icd10 and icd9.
>> >
>> >
>> > Thanks
>> >
>>
>>
>> IMPORTANT WARNING: The information in this message (and the documents
>> attached to it, if any) is confidential and may be legally privileged.
>> It is intended solely for the addressee. Access to this message by
>> anyone else is unauthorized. If you are not the intended recipient,
>> any disclosure, copying, distribution or any action taken, or omitted
>> to be taken, in reliance on it is prohibited and may be unlawful. If
>> you have received this message in error, please delete all electronic
>> copies of this message (and the documents attached to it, if any),
>> destroy any hard copies you may have created and notify me immediately by replying
to this email. Thank you.
>>
>> Geisinger Health System utilizes an encryption process to safeguard
>> Protected Health Information and other confidential data contained in
>> external e-mail messages. If email is encrypted, the recipient will
>> receive an e-mail instructing them to sign on to the Geisinger Health
>> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>>

Mime
View raw message