ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: ctakes with icd10; 2015 versions available on sourceforge!
Date Tue, 08 Dec 2015 21:22:23 GMT
Hi Brandon, thanks for finding and forwarding the instructions!

I have checked in two new hsqldb dictionaries, both from the 2015AB version of the UMLS. 
They both have codes for snomedct_us, rxnorm, icd9cm and icd10pcs - as well as the usual cui,
tui, preferred term mappings.

One uses cuis filtered by snomed and rxnorm, the other adds cuis filtered by icd9 and icd10.
What this means:  Cuis that exist for a [filter source] are added to the dictionary, as are
all text variations from all sources that contain that cui.  Both dictionaries also use the
standard ctakes semantic group tui filters.

The names are ctakessnorx2015 and ctakesicd2015

The snomed rxnorm :  
http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx2015/

The snomed rxnorm icd9 icd10:
http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

The svn root for the whole ugly thing is:
 svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk

Stats:
ctakessnorx2015
545,913 Terms
229,251 Concepts (Cuis)
272,987 Snomed codes
32,419 Rxnorm codes
11,321 icd9 codes
61 icd10 codes

Ctakesicd2015
611,230 Terms
282,211 Concepts
18,626 icd9 codes
45,818 icd10 codes
Snomed and Rxnorm counts are the same

So, adding the icd filters gave us an extra ~53,000 concepts and ~65,000 terms.

I would like to move this all to a better root (not ctakes-resources-snomed-rword-hsqldb-2011ab)
but I wasn't able to write directly in trunk (??) and need to get moving on to other things.

There is help on the ctakes wiki: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup
Though I should probably add a few items ...


Sean


-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] 
Sent: Tuesday, December 08, 2015 12:51 PM
To: dev@ctakes.apache.org
Subject: RE: ctakes with icd10

Not to perpetuate the instructions again but I sent these out not long ago when I was going
through the process and Sean was helping me.

	1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US"
	2. Copy ctakesumls.properties and ctakesumls.script from memdbtemplate to location to put
new UMLS DB
	3. Run DictionaryCreator2
	java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
"\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db jdbc:hsqldb:file:pathTonewDB\snorx2015
-tbl CUI_TERMS
	4. Run CodeMapCreator
	java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
-atui ./data/tiny/CtakesAnatTuis.txt -db jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
	5. Copy new DB files to new location and create a copy of cTakesHsql.xml and update dictionary
location

Thanks,
Brandon

-----Original Message-----
From: David Kincaid [mailto:kincaid.dave@gmail.com]
Sent: Tuesday, December 08, 2015 12:47 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10

This seems like a pretty common request and with such an old version of UMLS database shipped
with cTAKES it's only going to get worse. I've been wanting to build a dictionary using the
latest UMLS release (as well as a custom database), so would be happy to write up the steps
as I go through it. That assumes that I can dig up the instructions in the dev list.

- Dave

On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> The -shortest- answer is that you'll need to run the dictionary 
> creation tool.  There are instructions in older devlist threads.  By 
> default the dictionary creation tool does add icd9 and icd10 tables to the dictionary.
> The problem is that in Umls 2011AB those codes weren't very well 
> populated.  The 2015AB icd# set is much more rich so those tables 
> should be pretty good.  Then in ctakes you would look up annotations 
> by icd9 or icd10 codes instead of by cui:
> OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code 
> ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
>
> Sean
>
> -----Original Message-----
> From: Savova, Guergana [mailto:Guergana.Savova@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 12:17 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Hi Alaa,
> You need to create a resource off the terminology/ontology you want to 
> use (in this case ICD9 or ICD10). Then run that resource with cTAKES 
> for the fast dictionary lookup. There is cTAKES code and some 
> documentation on how to create that resource. By default, cTAKES runs 
> with a resource created from the English version of SNOMED CT and RxNORM.
> Hope this helps.
> --Guergana
>
> -----Original Message-----
> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> Sent: Tuesday, December 8, 2015 10:01 AM
> To: dev@ctakes.apache.org
> Subject: ctakes with icd10
>
> Hi,
>
> I downloaded Latest umls version, and I want to know how to make 
> ctakes work with icd10 and icd9.
>
>
> Thanks
>


IMPORTANT WARNING: The information in this message (and the documents attached to it, if any)
is confidential and may be legally privileged. It is intended solely for the addressee. Access
to this message by anyone else is unauthorized. If you are not the intended recipient, any
disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance
on it is prohibited and may be unlawful. If you have received this message in error, please
delete all electronic copies of this message (and the documents attached to it, if any), destroy
any hard copies you may have created and notify me immediately by replying to this email.
Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected Health Information
and other confidential data contained in external e-mail messages. If email is encrypted,
the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System
Secure E-mail Message Center to retrieve the encrypted e-mail.
Mime
View raw message