ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: Fast Dictionary Update
Date Thu, 17 Sep 2015 01:58:56 GMT
Excellent! 

-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] 
Sent: Wednesday, September 16, 2015 9:55 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

No, I had changed it on the Tiny source file.  I just changed the default file and it looks
to be running as expected now.

Thank you for all your help and patience, Brandon

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 9:35 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Did you add it to data/default/ CtakesSources.txt ?

If not then you need to specify -src ./data/tiny/CtakesSources.txt

Sorry for any confusion.

As soon as my inet isn't overloaded I'll download 2015AA and see if I can build a dictionary.

-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
Sent: Wednesday, September 16, 2015 8:14 PM
To: dev@ctakes.apache.org; dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Sean,

I added that and still had the same issue.

Thanks,
Brandon
_____________________________
From: Finan, Sean <sean.finan@childrens.harvard.edu<mailto:sean.finan@childrens.harvard.edu>>
Sent: Wednesday, September 16, 2015 7:56 PM
Subject: RE: Fast Dictionary Update
To: <dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>>


And you added "SNOMEDCT_US" to data/tiny/CtakesSources.txt ?

-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Wednesday, September 16, 2015 7:13 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

I have exactly the same problem with the tool.

A grep on MRCONSO.RRF for "SNOMEDCT" or for "SNOMEDCT_US" shows many lines.

________________________________________
From: Geise, Brandon D. [bdgeise@geisinger.edu<mailto:bdgeise@geisinger.edu>]
Sent: Wednesday, September 16, 2015 5:05 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Yes, it finds "SNOMEDCT_US".

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 5:17 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Ah, now I see what you mean. Can you do a grep on your MRCONSO.RRF for "SNOMEDCT" ?

-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
Sent: Wednesday, September 16, 2015 4:04 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

I tried changing as suggested.

Below is what I see for the snomed piece, but for RXNorm it writes terms at the end.

Reading list of Source Types from ./data/default/CtakesSources.txt File Lines 1 list of Source
Types 1 Reading list of Tuis from ./data/tiny/CtakesSnomedTuis.txt File Lines 24 list of Tuis
24 Compiling list of Cuis with wanted Tuis using /patto/UMLS_Current_Version/META/MRSTY.RRF
File Line 200000 Cuis 60895
File Line 300000 Cuis 85750
File Line 400000 Cuis 135098
File Line 600000 Cuis 183925
File Line 1700000<tel:1700000> Cuis 376338 File Line 1800000<tel:1800000> Cuis
471009 File Line 1900000<tel:1900000> Cuis 568375 File Line 2100000<tel:2100000>
Cuis 674715 File Line 2800000<tel:2800000> Cuis 903583 File Line 3300000<tel:3300000>
Cuis 973791 File Lines 3370173<tel:3370173> Cuis 999451 ..................................................File
Line 100000 Valid Cuis 0 ..................................................File Line 200000
Valid Cuis 0 ..................................................File Line 300000 Valid Cuis
0 ..................................................File Line 400000 Valid Cuis 0 ..................................................File
Line 500000 Valid Cuis 0 ..................................................File Line 600000
Valid Cuis 0 ..................................................File Line 700000 Valid Cuis
0 ..................................................File Line 800000 Valid Cuis 0 ..................................................File
Line 900000 Valid Cuis 0 ..................................................File Line 1000000<tel:1000000>
Valid Cuis 0 ..................................................File Line 1100000<tel:1100000>
Valid Cuis 0 ..................................................File Line 1200000<tel:1200000>
Valid Cuis 0 ..................................................File Line 1300000<tel:1300000>
Valid Cuis 0 ..................................................File Line 1400000<tel:1400000>
Valid Cuis 0 ..................................................File Line 1500000<tel:1500000>
Valid Cuis 0 ..................................................File Line 1600000<tel:1600000>
Valid Cuis 0 ..................................................File Line 1700000<tel:1700000>
Valid Cuis 0 ..................................................File Line 1800000<tel:1800000>
Valid Cuis 0 ..................................................File Line 1900000<tel:1900000>
Valid Cuis 0 ..................................................File Line 2000000<tel:2000000>
Valid Cuis 0 ..................................................File Line 2100000<tel:2100000>
Valid Cuis 0 ..................................................File Line 2200000<tel:2200000>
Valid Cuis 0 ..................................................File Line 2300000<tel:2300000>
Valid Cuis 0 ..................................................File Line 2400000<tel:2400000>
Valid Cuis 0 ..................................................File Line 2500000<tel:2500000>
Valid Cuis 0 ..................................................File Line 2600000<tel:2600000>
Valid Cuis 0 ..................................................File Line 2700000<tel:2700000>
Valid Cuis 0 ..................................................File Line 2800000<tel:2800000>
Valid Cuis 0 ..................................................File Line 2900000<tel:2900000>
Valid Cuis 0 ..................................................File Line 3000000<tel:3000000>
Valid Cuis 0 ..................................................File Line 3100000<tel:3100000>
Valid Cuis 0 ..................................................File Line 3200000<tel:3200000>
Valid Cuis 0 ..................................................File Line 3300000<tel:3300000>
Valid Cuis 0 ..................................................File Line 3400000<tel:3400000>
Valid Cuis 0 ..................................................File Line 3500000<tel:3500000>
Valid Cuis 0 ..................................................File Line 3600000<tel:3600000>
Valid Cuis 0 ..................................................File Line 3700000<tel:3700000>
Valid Cuis 0 ..................................................File Line 3800000<tel:3800000>
Valid Cuis 0 ..................................................File Line 3900000<tel:3900000>
Valid Cuis 0 ..................................................File Line 4000000<tel:4000000>
Valid Cuis 0 ..................................................File Line 4100000<tel:4100000>
Valid Cuis 0 ..................................................File Line 4200000<tel:4200000>
Valid Cuis 0 ..................................................File Line 4300000<tel:4300000>
Valid Cuis 0 ..................................................File Line 4400000<tel:4400000>
Valid Cuis 0 ..................................................File Line 4500000<tel:4500000>
Valid Cuis 0 ..................................................File Line 4600000<tel:4600000>
Valid Cuis 0 ..................................................File Line 4700000<tel:4700000>
Valid Cuis 0 ..................................................File Line 4800000<tel:4800000>
Valid Cuis 0 ..................................................File Line 4900000<tel:4900000>
Valid Cuis 0 ..................................................File Line 5000000<tel:5000000>
Valid Cuis 0 ..................................................File Line 5100000<tel:5100000>
Valid Cuis 0 ..................................................File Line 5200000<tel:5200000>
Valid Cuis 0 ..................................................File Line 5300000<tel:5300000>
Valid Cuis 0 ..................................................File Line 5400000<tel:5400000>
Valid Cuis 0 ..................................................File Line 5500000<tel:5500000>
Valid Cuis 0 ..................................................File Line 5600000<tel:5600000>
Valid Cuis 0 ..................................................File Line 5700000<tel:5700000>
Valid Cuis 0 ..................................................File Line 5800000<tel:5800000>
Valid Cuis 0 ..................................................File Line 5900000<tel:5900000>
Valid Cuis 0 ..................................................File Line 6000000<tel:6000000>
Valid Cuis 0 ..................................................File Line 6100000<tel:6100000>
Valid Cuis 0 ..................................................File Line 6200000<tel:6200000>
Valid Cuis 0 ..................................................File Line 6300000<tel:6300000>
Valid Cuis 0 ..................................................File Line 6400000<tel:6400000>
Valid Cuis 0 ..................................................File Line 6500000<tel:6500000>
Valid Cuis 0 ..................................................File Line 6600000<tel:6600000>
Valid Cuis 0 ..................................................File Line 6700000<tel:6700000>
Valid Cuis 0 ..................................................File Line 6800000<tel:6800000>
Valid Cuis 0 ..................................................File Line 6900000<tel:6900000>
Valid Cuis 0 ..................................................File Line 7000000<tel:7000000>
Valid Cuis 0 ..................................................File Line 7100000<tel:7100000>
Valid Cuis 0 ..................................................File Line 7200000<tel:7200000>
Valid Cuis 0 ..................................................File Line 7300000<tel:7300000>
Valid Cuis 0 ..................................................File Line 7400000<tel:7400000>
Valid Cuis 0 ..................................................File Line 7500000<tel:7500000>
Valid Cuis 0 ..................................................File Line 7600000<tel:7600000>
Valid Cuis 0 ..................................................File Line 7700000<tel:7700000>
Valid Cuis 0 ..................................................File Line 7800000<tel:7800000>
Valid Cuis 0 ..................................................File Line 7900000<tel:7900000>
Valid Cuis 0 ..................................................File Line 8000000<tel:8000000>
Valid Cuis 0 ..................................................File Line 8100000<tel:8100000>
Valid Cuis 0 ..................................................File Line 8200000<tel:8200000>
Valid Cuis 0 ..................................................File Line 8300000<tel:8300000>
Valid Cuis 0 ..................................................File Line 8400000<tel:8400000>
Valid Cuis 0 ..................................................File Line 8500000<tel:8500000>
Valid Cuis 0 ..................................................File Line 8600000<tel:8600000>
Valid Cuis 0 ..................................................File Line 8700000<tel:8700000>
Valid Cuis 0 ..................................................File Line 8800000<tel:8800000>
Valid Cuis 0 .............File Lines 8827152<tel:8827152> Valid Cuis 0 Compiling map
of Umls Cuis and Texts ..................................................File Line 100000
Terms 0 ..................................................File Line 200000 Terms 0 ..................................................File
Line 300000 Terms 0 ..................................................File Line 400000 Terms
0 ..................................................File Line 500000 Terms 0 ..................................................File
Line 600000 Terms 0 ..................................................File Line 700000 Terms
0 ..................................................File Line 800000 Terms 0 ..................................................File
Line 900000 Terms 0 ..................................................File Line 1000000<tel:1000000>
Terms 0 ..................................................File Line 1100000<tel:1100000>
Terms 0 ..................................................File Line 1200000<tel:1200000>
Terms 0 ..................................................File Line 1300000<tel:1300000>
Terms 0 ..................................................File Line 1400000<tel:1400000>
Terms 0 ..................................................File Line 1500000<tel:1500000>
Terms 0 ..................................................File Line 1600000<tel:1600000>
Terms 0 ..................................................File Line 1700000<tel:1700000>
Terms 0 ..................................................File Line 1800000<tel:1800000>
Terms 0 ..................................................File Line 1900000<tel:1900000>
Terms 0 ..................................................File Line 2000000<tel:2000000>
Terms 0 ..................................................File Line 2100000<tel:2100000>
Terms 0 ..................................................File Line 2200000<tel:2200000>
Terms 0 ..................................................File Line 2300000<tel:2300000>
Terms 0 ..................................................File Line 2400000<tel:2400000>
Terms 0 ..................................................File Line 2500000<tel:2500000>
Terms 0 ..................................................File Line 2600000<tel:2600000>
Terms 0 ..................................................File Line 2700000<tel:2700000>
Terms 0 ..................................................File Line 2800000<tel:2800000>
Terms 0 ..................................................File Line 2900000<tel:2900000>
Terms 0 ..................................................File Line 3000000<tel:3000000>
Terms 0 ..................................................File Line 3100000<tel:3100000>
Terms 0 ..................................................File Line 3200000<tel:3200000>
Terms 0 ..................................................File Line 3300000<tel:3300000>
Terms 0 ..................................................File Line 3400000<tel:3400000>
Terms 0 ..................................................File Line 3500000<tel:3500000>
Terms 0 ..................................................File Line 3600000<tel:3600000>
Terms 0 ..................................................File Line 3700000<tel:3700000>
Terms 0 ..................................................File Line 3800000<tel:3800000>
Terms 0 ..................................................File Line 3900000<tel:3900000>
Terms 0 ..................................................File Line 4000000<tel:4000000>
Terms 0 ..................................................File Line 4100000<tel:4100000>
Terms 0 ..................................................File Line 4200000<tel:4200000>
Terms 0 ..................................................File Line 4300000<tel:4300000>
Terms 0 ..................................................File Line 4400000<tel:4400000>
Terms 0 ..................................................File Line 4500000<tel:4500000>
Terms 0 ..................................................File Line 4600000<tel:4600000>
Terms 0 ..................................................File Line 4700000<tel:4700000>
Terms 0 ..................................................File Line 4800000<tel:4800000>
Terms 0 ..................................................File Line 4900000<tel:4900000>
Terms 0 ..................................................File Line 5000000<tel:5000000>
Terms 0 ..................................................File Line 5100000<tel:5100000>
Terms 0 ..................................................File Line 5200000<tel:5200000>
Terms 0 ..................................................File Line 5300000<tel:5300000>
Terms 0 ..................................................File Line 5400000<tel:5400000>
Terms 0 ..................................................File Line 5500000<tel:5500000>
Terms 0 ..................................................File Line 5600000<tel:5600000>
Terms 0 ..................................................File Line 5700000<tel:5700000>
Terms 0 ..................................................File Line 5800000<tel:5800000>
Terms 0 ..................................................File Line 5900000<tel:5900000>
Terms 0 ..................................................File Line 6000000<tel:6000000>
Terms 0 ..................................................File Line 6100000<tel:6100000>
Terms 0 ..................................................File Line 6200000<tel:6200000>
Terms 0 ..................................................File Line 6300000<tel:6300000>
Terms 0 ..................................................File Line 6400000<tel:6400000>
Terms 0 ..................................................File Line 6500000<tel:6500000>
Terms 0 ..................................................File Line 6600000<tel:6600000>
Terms 0 ..................................................File Line 6700000<tel:6700000>
Terms 0 ..................................................File Line 6800000<tel:6800000>
Terms 0 ..................................................File Line 6900000<tel:6900000>
Terms 0 ..................................................File Line 7000000<tel:7000000>
Terms 0 ..................................................File Line 7100000<tel:7100000>
Terms 0 ..................................................File Line 7200000<tel:7200000>
Terms 0 ..................................................File Line 7300000<tel:7300000>
Terms 0 ..................................................File Line 7400000<tel:7400000>
Terms 0 ..................................................File Line 7500000<tel:7500000>
Terms 0 ..................................................File Line 7600000<tel:7600000>
Terms 0 ..................................................File Line 7700000<tel:7700000>
Terms 0 ..................................................File Line 7800000<tel:7800000>
Terms 0 ..................................................File Line 7900000<tel:7900000>
Terms 0 ..................................................File Line 8000000<tel:8000000>
Terms 0 ..................................................File Line 8100000<tel:8100000>
Terms 0 ..................................................File Line 8200000<tel:8200000>
Terms 0 ..................................................File Line 8300000<tel:8300000>
Terms 0 ..................................................File Line 8400000<tel:8400000>
Terms 0 ..................................................File Line 8500000<tel:8500000>
Terms 0 ..................................................File Line 8600000<tel:8600000>
Terms 0 ..................................................File Line 8700000<tel:8700000>
Terms 0 ..................................................File Line 8800000<tel:8800000>
Terms 0 .............File Line 8827152<tel:8827152> Terms 0 Writing map of Cuis and
Texts to pathtoUmls2015.bsv

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 4:00 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: Fast Dictionary Update

Thank you! I believe that was a change post 2011! You should actually be ok with both SNOMEDCT
and SNOMEDCT_US in CtakesSources.txt

Cheers,
Sean

-----Original Message-----
From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com]
Sent: Wednesday, September 16, 2015 3:43 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: Fast Dictionary Update

If this can helps, I had to replace 'SNOMEDCT' with 'SNOMEDCT_US' in CtakesSources.txt.

On Wed, Sep 16, 2015 at 2:33 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.edu>>
wrote:

> I'm not sure that I understand your question. As I sent it, the anat, 
> snomed and rxnorm are not separate runs. The args line I sent earlier 
> is for a single run that will create a dictionary with snomed and 
> rxnorm terms. The anatomy tui list has a special use in correctly 
> processing snomed codes.
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 3:27 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: Fast Dictionary Update
>
> Ok, hopefully one last question.
>
> Based on your example everything runs, however the Anat and Snomed 
> runs don't produce any valid CUIs but RXNorm does. I'm not sure if 
> this has anything to do with it but every UMLS source read is against MRSTY.
>
> Here's my command
>
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> /path/to/UMLS/META -fd ./data/tiny -atui 
> ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt 
> -ol path o ileUmls2015.bsv
>
> Any suggestions?
>
> Thanks again,
> Brandon
>
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 3:05 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: Fast Dictionary Update
>
> Yes, that will make the rare word dictionary in a memory-based hsql 
> database - the same as the default for the dictionary-lookup-fast module.
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 2:42 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: Fast Dictionary Update
>
> Thanks Sean, much appreciated. To clarify the example below would 
> create the dictionary for use for the rare word approach?
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 2:16 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: Fast Dictionary Update
>
> Hi Brandon,
>
> I just checked in a bin/dictionarytool.zip It should have everything 
> that you need (.jar, lib/, data/).
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 [args] Should do 
> the trick.
>
> To recreate a 2015 version of the current ctakes dictionary, the 
> arguments
> are:
> -umls my/path/to/2015AA/META -fd ./data/tiny -atui 
> ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt 
> -db
> jdbc:hsqldb:file:my/path/to/snorx2015 -tbl CUI_TERMS
>
> Create my/path/to/snorx2015 by copying 
> resources/memdbtemplate/ctakesumls.properties to 
> my/path/to/snorx2015.properties - there is a resources/README about this.
>
> Before populating a DB, I usually do a trial run first, writing to a 
> flat file. Replace "-db ... -tbl ..." with "-ol my/path/to/testout.bsv"
>
>
> Sean
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 1:49 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: Fast Dictionary Update
>
> Hi Sean,
>
> That'd be great.
>
> I think I'm building it incorrectly because after I build the jar and 
> try to run specifying DictionaryCreator2 as the main class it says it 
> can't find it. I'm not too familiar with Java and building 
> projects/jars so it could be my ignorance causing the problem.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 1:45 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: Fast Dictionary Update
>
> Hi Brandon,
>
> I can send you a jar or commit one pre-built. What goes wrong when you 
> try to build the tool?
>
> Sean
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 1:23 PM
> To: 'dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>'
> Subject: Fast Dictionary Update
>
> Does someone have the DictionaryTool jar available? I'm having trouble 
> creating the jar file from the project and would like to be able to 
> create an updated UMLS fast dictionary for 2015.
>
> Thanks,
> Brandon
>
>
> IMPORTANT WARNING: The information in this message (and the documents 
> attached to it, if any) is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this message by 
> anyone else is unauthorized. If you are not the intended recipient, 
> any disclosure, copying, distribution or any action taken, or omitted 
> to be taken, in reliance on it is prohibited and may be unlawful. If 
> you have received this message in error, please delete all electronic 
> copies of this message (and the documents attached to it, if any), 
> destroy any hard copies you may have created and notify me immediately by replying to
this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard 
> Protected Health Information and other confidential data contained in 
> external e-mail messages. If email is encrypted, the recipient will 
> receive an e-mail instructing them to sign on to the Geisinger Health 
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>
>
>
>




Mime
View raw message