ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: Fast Dictionary Update
Date Wed, 16 Sep 2015 21:17:16 GMT
Ah, now I see what you mean.  Can you do a grep on your MRCONSO.RRF for "SNOMEDCT" ?  

-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] 
Sent: Wednesday, September 16, 2015 4:04 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

I tried changing as suggested.

Below is what I see for the snomed piece, but for RXNorm it writes terms at the end.

Reading list of Source Types from ./data/default/CtakesSources.txt
File Lines 1	 list of Source Types 1
Reading list of Tuis from ./data/tiny/CtakesSnomedTuis.txt
File Lines 24	 list of Tuis 24
Compiling list of Cuis with wanted Tuis using /patto/UMLS_Current_Version/META/MRSTY.RRF
File Line 200000	 Cuis 60895
File Line 300000	 Cuis 85750
File Line 400000	 Cuis 135098
File Line 600000	 Cuis 183925
File Line 1700000	 Cuis 376338
File Line 1800000	 Cuis 471009
File Line 1900000	 Cuis 568375
File Line 2100000	 Cuis 674715
File Line 2800000	 Cuis 903583
File Line 3300000	 Cuis 973791
File Lines 3370173	 Cuis 999451
..................................................File Line 100000	 Valid Cuis 0
..................................................File Line 200000	 Valid Cuis 0
..................................................File Line 300000	 Valid Cuis 0
..................................................File Line 400000	 Valid Cuis 0
..................................................File Line 500000	 Valid Cuis 0
..................................................File Line 600000	 Valid Cuis 0
..................................................File Line 700000	 Valid Cuis 0
..................................................File Line 800000	 Valid Cuis 0
..................................................File Line 900000	 Valid Cuis 0
..................................................File Line 1000000	 Valid Cuis 0
..................................................File Line 1100000	 Valid Cuis 0
..................................................File Line 1200000	 Valid Cuis 0
..................................................File Line 1300000	 Valid Cuis 0
..................................................File Line 1400000	 Valid Cuis 0
..................................................File Line 1500000	 Valid Cuis 0
..................................................File Line 1600000	 Valid Cuis 0
..................................................File Line 1700000	 Valid Cuis 0
..................................................File Line 1800000	 Valid Cuis 0
..................................................File Line 1900000	 Valid Cuis 0
..................................................File Line 2000000	 Valid Cuis 0
..................................................File Line 2100000	 Valid Cuis 0
..................................................File Line 2200000	 Valid Cuis 0
..................................................File Line 2300000	 Valid Cuis 0
..................................................File Line 2400000	 Valid Cuis 0
..................................................File Line 2500000	 Valid Cuis 0
..................................................File Line 2600000	 Valid Cuis 0
..................................................File Line 2700000	 Valid Cuis 0
..................................................File Line 2800000	 Valid Cuis 0
..................................................File Line 2900000	 Valid Cuis 0
..................................................File Line 3000000	 Valid Cuis 0
..................................................File Line 3100000	 Valid Cuis 0
..................................................File Line 3200000	 Valid Cuis 0
..................................................File Line 3300000	 Valid Cuis 0
..................................................File Line 3400000	 Valid Cuis 0
..................................................File Line 3500000	 Valid Cuis 0
..................................................File Line 3600000	 Valid Cuis 0
..................................................File Line 3700000	 Valid Cuis 0
..................................................File Line 3800000	 Valid Cuis 0
..................................................File Line 3900000	 Valid Cuis 0
..................................................File Line 4000000	 Valid Cuis 0
..................................................File Line 4100000	 Valid Cuis 0
..................................................File Line 4200000	 Valid Cuis 0
..................................................File Line 4300000	 Valid Cuis 0
..................................................File Line 4400000	 Valid Cuis 0
..................................................File Line 4500000	 Valid Cuis 0
..................................................File Line 4600000	 Valid Cuis 0
..................................................File Line 4700000	 Valid Cuis 0
..................................................File Line 4800000	 Valid Cuis 0
..................................................File Line 4900000	 Valid Cuis 0
..................................................File Line 5000000	 Valid Cuis 0
..................................................File Line 5100000	 Valid Cuis 0
..................................................File Line 5200000	 Valid Cuis 0
..................................................File Line 5300000	 Valid Cuis 0
..................................................File Line 5400000	 Valid Cuis 0
..................................................File Line 5500000	 Valid Cuis 0
..................................................File Line 5600000	 Valid Cuis 0
..................................................File Line 5700000	 Valid Cuis 0
..................................................File Line 5800000	 Valid Cuis 0
..................................................File Line 5900000	 Valid Cuis 0
..................................................File Line 6000000	 Valid Cuis 0
..................................................File Line 6100000	 Valid Cuis 0
..................................................File Line 6200000	 Valid Cuis 0
..................................................File Line 6300000	 Valid Cuis 0
..................................................File Line 6400000	 Valid Cuis 0
..................................................File Line 6500000	 Valid Cuis 0
..................................................File Line 6600000	 Valid Cuis 0
..................................................File Line 6700000	 Valid Cuis 0
..................................................File Line 6800000	 Valid Cuis 0
..................................................File Line 6900000	 Valid Cuis 0
..................................................File Line 7000000	 Valid Cuis 0
..................................................File Line 7100000	 Valid Cuis 0
..................................................File Line 7200000	 Valid Cuis 0
..................................................File Line 7300000	 Valid Cuis 0
..................................................File Line 7400000	 Valid Cuis 0
..................................................File Line 7500000	 Valid Cuis 0
..................................................File Line 7600000	 Valid Cuis 0
..................................................File Line 7700000	 Valid Cuis 0
..................................................File Line 7800000	 Valid Cuis 0
..................................................File Line 7900000	 Valid Cuis 0
..................................................File Line 8000000	 Valid Cuis 0
..................................................File Line 8100000	 Valid Cuis 0
..................................................File Line 8200000	 Valid Cuis 0
..................................................File Line 8300000	 Valid Cuis 0
..................................................File Line 8400000	 Valid Cuis 0
..................................................File Line 8500000	 Valid Cuis 0
..................................................File Line 8600000	 Valid Cuis 0
..................................................File Line 8700000	 Valid Cuis 0
..................................................File Line 8800000	 Valid Cuis 0
.............File Lines 8827152	 Valid Cuis 0
Compiling map of Umls Cuis and Texts
..................................................File Line 100000	 Terms 0
..................................................File Line 200000	 Terms 0
..................................................File Line 300000	 Terms 0
..................................................File Line 400000	 Terms 0
..................................................File Line 500000	 Terms 0
..................................................File Line 600000	 Terms 0
..................................................File Line 700000	 Terms 0
..................................................File Line 800000	 Terms 0
..................................................File Line 900000	 Terms 0
..................................................File Line 1000000	 Terms 0
..................................................File Line 1100000	 Terms 0
..................................................File Line 1200000	 Terms 0
..................................................File Line 1300000	 Terms 0
..................................................File Line 1400000	 Terms 0
..................................................File Line 1500000	 Terms 0
..................................................File Line 1600000	 Terms 0
..................................................File Line 1700000	 Terms 0
..................................................File Line 1800000	 Terms 0
..................................................File Line 1900000	 Terms 0
..................................................File Line 2000000	 Terms 0
..................................................File Line 2100000	 Terms 0
..................................................File Line 2200000	 Terms 0
..................................................File Line 2300000	 Terms 0
..................................................File Line 2400000	 Terms 0
..................................................File Line 2500000	 Terms 0
..................................................File Line 2600000	 Terms 0
..................................................File Line 2700000	 Terms 0
..................................................File Line 2800000	 Terms 0
..................................................File Line 2900000	 Terms 0
..................................................File Line 3000000	 Terms 0
..................................................File Line 3100000	 Terms 0
..................................................File Line 3200000	 Terms 0
..................................................File Line 3300000	 Terms 0
..................................................File Line 3400000	 Terms 0
..................................................File Line 3500000	 Terms 0
..................................................File Line 3600000	 Terms 0
..................................................File Line 3700000	 Terms 0
..................................................File Line 3800000	 Terms 0
..................................................File Line 3900000	 Terms 0
..................................................File Line 4000000	 Terms 0
..................................................File Line 4100000	 Terms 0
..................................................File Line 4200000	 Terms 0
..................................................File Line 4300000	 Terms 0
..................................................File Line 4400000	 Terms 0
..................................................File Line 4500000	 Terms 0
..................................................File Line 4600000	 Terms 0
..................................................File Line 4700000	 Terms 0
..................................................File Line 4800000	 Terms 0
..................................................File Line 4900000	 Terms 0
..................................................File Line 5000000	 Terms 0
..................................................File Line 5100000	 Terms 0
..................................................File Line 5200000	 Terms 0
..................................................File Line 5300000	 Terms 0
..................................................File Line 5400000	 Terms 0
..................................................File Line 5500000	 Terms 0
..................................................File Line 5600000	 Terms 0
..................................................File Line 5700000	 Terms 0
..................................................File Line 5800000	 Terms 0
..................................................File Line 5900000	 Terms 0
..................................................File Line 6000000	 Terms 0
..................................................File Line 6100000	 Terms 0
..................................................File Line 6200000	 Terms 0
..................................................File Line 6300000	 Terms 0
..................................................File Line 6400000	 Terms 0
..................................................File Line 6500000	 Terms 0
..................................................File Line 6600000	 Terms 0
..................................................File Line 6700000	 Terms 0
..................................................File Line 6800000	 Terms 0
..................................................File Line 6900000	 Terms 0
..................................................File Line 7000000	 Terms 0
..................................................File Line 7100000	 Terms 0
..................................................File Line 7200000	 Terms 0
..................................................File Line 7300000	 Terms 0
..................................................File Line 7400000	 Terms 0
..................................................File Line 7500000	 Terms 0
..................................................File Line 7600000	 Terms 0
..................................................File Line 7700000	 Terms 0
..................................................File Line 7800000	 Terms 0
..................................................File Line 7900000	 Terms 0
..................................................File Line 8000000	 Terms 0
..................................................File Line 8100000	 Terms 0
..................................................File Line 8200000	 Terms 0
..................................................File Line 8300000	 Terms 0
..................................................File Line 8400000	 Terms 0
..................................................File Line 8500000	 Terms 0
..................................................File Line 8600000	 Terms 0
..................................................File Line 8700000	 Terms 0
..................................................File Line 8800000	 Terms 0
.............File Line 8827152	 Terms 0
Writing map of Cuis and Texts to \pathto\Umls2015.bsv

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 4:00 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Thank you!  I believe that was a change post 2011!  You should actually be ok with both SNOMEDCT
and SNOMEDCT_US in CtakesSources.txt

Cheers,
Sean

-----Original Message-----
From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com]
Sent: Wednesday, September 16, 2015 3:43 PM
To: dev@ctakes.apache.org
Subject: Re: Fast Dictionary Update

If this can helps, I had to replace 'SNOMEDCT' with 'SNOMEDCT_US' in CtakesSources.txt.

On Wed, Sep 16, 2015 at 2:33 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> I'm not sure that I understand your question.  As I sent it, the anat, 
> snomed and rxnorm are not separate runs.  The args line I sent earlier 
> is for a single run that will create a dictionary with snomed and 
> rxnorm terms.  The anatomy tui list has a special use in correctly 
> processing snomed codes.
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 3:27 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Ok, hopefully one last question.
>
> Based on your example everything runs, however the Anat and Snomed 
> runs don't produce any valid CUIs but RXNorm does.  I'm not sure if 
> this has anything to do with it but every UMLS source read is against MRSTY.
>
> Here's my command
>
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> /path/to/UMLS/META -fd ./data/tiny -atui 
> ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt 
> -ol \path\to\file\Umls2015.bsv
>
> Any suggestions?
>
> Thanks again,
> Brandon
>
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 3:05 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Yes, that will make the rare word dictionary in a memory-based hsql 
> database - the same as the default for the dictionary-lookup-fast module.
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 2:42 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Thanks Sean, much appreciated.  To clarify the example below would 
> create the dictionary for use for the rare word approach?
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 2:16 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Hi Brandon,
>
> I just checked in a bin/dictionarytool.zip It should have everything 
> that you need (.jar, lib/, data/).
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 [args] Should do 
> the trick.
>
> To recreate a 2015 version of the current ctakes dictionary, the 
> arguments
> are:
> -umls my/path/to/2015AA/META -fd ./data/tiny -atui 
> ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt 
> -db
> jdbc:hsqldb:file:my/path/to/snorx2015 -tbl CUI_TERMS
>
> Create my/path/to/snorx2015 by copying 
> resources/memdbtemplate/ctakesumls.properties to
> my/path/to/snorx2015.properties   - there is a resources/README about this.
>
> Before populating a DB, I usually do a trial run first, writing to a 
> flat file.  Replace "-db ... -tbl ..." with "-ol my/path/to/testout.bsv"
>
>
> Sean
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 1:49 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Hi Sean,
>
> That'd be great.
>
> I think I'm building it incorrectly because after I build the jar and 
> try to run specifying DictionaryCreator2 as the main class it says it 
> can't find it.  I'm not too familiar with Java and building 
> projects/jars so it could be my ignorance causing the problem.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 1:45 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Hi Brandon,
>
> I can send you a jar or commit one pre-built.  What goes wrong when 
> you try to build the tool?
>
> Sean
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 1:23 PM
> To: 'dev@ctakes.apache.org'
> Subject: Fast Dictionary Update
>
> Does someone have the DictionaryTool jar available?  I'm having 
> trouble creating the jar file from the project and would like to be 
> able to create an updated UMLS fast dictionary for 2015.
>
> Thanks,
> Brandon
>
>
> IMPORTANT WARNING: The information in this message (and the documents 
> attached to it, if any) is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this message by 
> anyone else is unauthorized. If you are not the intended recipient, 
> any disclosure, copying, distribution or any action taken, or omitted 
> to be taken, in reliance on it is prohibited and may be unlawful. If 
> you have received this message in error, please delete all electronic 
> copies of this message (and the documents attached to it, if any), 
> destroy any hard copies you may have created and notify me immediately by replying to
this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard 
> Protected Health Information and other confidential data contained in 
> external e-mail messages. If email is encrypted, the recipient will 
> receive an e-mail instructing them to sign on to the Geisinger Health 
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>
>
>
>
Mime
View raw message