ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomasz Oliwa <ol...@uchicago.edu>
Subject RE: Fast Dictionary Update
Date Wed, 16 Sep 2015 23:13:12 GMT
I have exactly the same problem with the tool.

A grep on MRCONSO.RRF for "SNOMEDCT" or for "SNOMEDCT_US" shows many lines.

________________________________________
From: Geise, Brandon D. [bdgeise@geisinger.edu]
Sent: Wednesday, September 16, 2015 5:05 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Yes, it finds "SNOMEDCT_US".

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 5:17 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Ah, now I see what you mean.  Can you do a grep on your MRCONSO.RRF for "SNOMEDCT" ?

-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
Sent: Wednesday, September 16, 2015 4:04 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

I tried changing as suggested.

Below is what I see for the snomed piece, but for RXNorm it writes terms at the end.

Reading list of Source Types from ./data/default/CtakesSources.txt
File Lines 1     list of Source Types 1
Reading list of Tuis from ./data/tiny/CtakesSnomedTuis.txt
File Lines 24    list of Tuis 24
Compiling list of Cuis with wanted Tuis using /patto/UMLS_Current_Version/META/MRSTY.RRF
File Line 200000         Cuis 60895
File Line 300000         Cuis 85750
File Line 400000         Cuis 135098
File Line 600000         Cuis 183925
File Line 1700000        Cuis 376338
File Line 1800000        Cuis 471009
File Line 1900000        Cuis 568375
File Line 2100000        Cuis 674715
File Line 2800000        Cuis 903583
File Line 3300000        Cuis 973791
File Lines 3370173       Cuis 999451
..................................................File Line 100000       Valid Cuis 0
..................................................File Line 200000       Valid Cuis 0
..................................................File Line 300000       Valid Cuis 0
..................................................File Line 400000       Valid Cuis 0
..................................................File Line 500000       Valid Cuis 0
..................................................File Line 600000       Valid Cuis 0
..................................................File Line 700000       Valid Cuis 0
..................................................File Line 800000       Valid Cuis 0
..................................................File Line 900000       Valid Cuis 0
..................................................File Line 1000000      Valid Cuis 0
..................................................File Line 1100000      Valid Cuis 0
..................................................File Line 1200000      Valid Cuis 0
..................................................File Line 1300000      Valid Cuis 0
..................................................File Line 1400000      Valid Cuis 0
..................................................File Line 1500000      Valid Cuis 0
..................................................File Line 1600000      Valid Cuis 0
..................................................File Line 1700000      Valid Cuis 0
..................................................File Line 1800000      Valid Cuis 0
..................................................File Line 1900000      Valid Cuis 0
..................................................File Line 2000000      Valid Cuis 0
..................................................File Line 2100000      Valid Cuis 0
..................................................File Line 2200000      Valid Cuis 0
..................................................File Line 2300000      Valid Cuis 0
..................................................File Line 2400000      Valid Cuis 0
..................................................File Line 2500000      Valid Cuis 0
..................................................File Line 2600000      Valid Cuis 0
..................................................File Line 2700000      Valid Cuis 0
..................................................File Line 2800000      Valid Cuis 0
..................................................File Line 2900000      Valid Cuis 0
..................................................File Line 3000000      Valid Cuis 0
..................................................File Line 3100000      Valid Cuis 0
..................................................File Line 3200000      Valid Cuis 0
..................................................File Line 3300000      Valid Cuis 0
..................................................File Line 3400000      Valid Cuis 0
..................................................File Line 3500000      Valid Cuis 0
..................................................File Line 3600000      Valid Cuis 0
..................................................File Line 3700000      Valid Cuis 0
..................................................File Line 3800000      Valid Cuis 0
..................................................File Line 3900000      Valid Cuis 0
..................................................File Line 4000000      Valid Cuis 0
..................................................File Line 4100000      Valid Cuis 0
..................................................File Line 4200000      Valid Cuis 0
..................................................File Line 4300000      Valid Cuis 0
..................................................File Line 4400000      Valid Cuis 0
..................................................File Line 4500000      Valid Cuis 0
..................................................File Line 4600000      Valid Cuis 0
..................................................File Line 4700000      Valid Cuis 0
..................................................File Line 4800000      Valid Cuis 0
..................................................File Line 4900000      Valid Cuis 0
..................................................File Line 5000000      Valid Cuis 0
..................................................File Line 5100000      Valid Cuis 0
..................................................File Line 5200000      Valid Cuis 0
..................................................File Line 5300000      Valid Cuis 0
..................................................File Line 5400000      Valid Cuis 0
..................................................File Line 5500000      Valid Cuis 0
..................................................File Line 5600000      Valid Cuis 0
..................................................File Line 5700000      Valid Cuis 0
..................................................File Line 5800000      Valid Cuis 0
..................................................File Line 5900000      Valid Cuis 0
..................................................File Line 6000000      Valid Cuis 0
..................................................File Line 6100000      Valid Cuis 0
..................................................File Line 6200000      Valid Cuis 0
..................................................File Line 6300000      Valid Cuis 0
..................................................File Line 6400000      Valid Cuis 0
..................................................File Line 6500000      Valid Cuis 0
..................................................File Line 6600000      Valid Cuis 0
..................................................File Line 6700000      Valid Cuis 0
..................................................File Line 6800000      Valid Cuis 0
..................................................File Line 6900000      Valid Cuis 0
..................................................File Line 7000000      Valid Cuis 0
..................................................File Line 7100000      Valid Cuis 0
..................................................File Line 7200000      Valid Cuis 0
..................................................File Line 7300000      Valid Cuis 0
..................................................File Line 7400000      Valid Cuis 0
..................................................File Line 7500000      Valid Cuis 0
..................................................File Line 7600000      Valid Cuis 0
..................................................File Line 7700000      Valid Cuis 0
..................................................File Line 7800000      Valid Cuis 0
..................................................File Line 7900000      Valid Cuis 0
..................................................File Line 8000000      Valid Cuis 0
..................................................File Line 8100000      Valid Cuis 0
..................................................File Line 8200000      Valid Cuis 0
..................................................File Line 8300000      Valid Cuis 0
..................................................File Line 8400000      Valid Cuis 0
..................................................File Line 8500000      Valid Cuis 0
..................................................File Line 8600000      Valid Cuis 0
..................................................File Line 8700000      Valid Cuis 0
..................................................File Line 8800000      Valid Cuis 0
.............File Lines 8827152  Valid Cuis 0
Compiling map of Umls Cuis and Texts
..................................................File Line 100000       Terms 0
..................................................File Line 200000       Terms 0
..................................................File Line 300000       Terms 0
..................................................File Line 400000       Terms 0
..................................................File Line 500000       Terms 0
..................................................File Line 600000       Terms 0
..................................................File Line 700000       Terms 0
..................................................File Line 800000       Terms 0
..................................................File Line 900000       Terms 0
..................................................File Line 1000000      Terms 0
..................................................File Line 1100000      Terms 0
..................................................File Line 1200000      Terms 0
..................................................File Line 1300000      Terms 0
..................................................File Line 1400000      Terms 0
..................................................File Line 1500000      Terms 0
..................................................File Line 1600000      Terms 0
..................................................File Line 1700000      Terms 0
..................................................File Line 1800000      Terms 0
..................................................File Line 1900000      Terms 0
..................................................File Line 2000000      Terms 0
..................................................File Line 2100000      Terms 0
..................................................File Line 2200000      Terms 0
..................................................File Line 2300000      Terms 0
..................................................File Line 2400000      Terms 0
..................................................File Line 2500000      Terms 0
..................................................File Line 2600000      Terms 0
..................................................File Line 2700000      Terms 0
..................................................File Line 2800000      Terms 0
..................................................File Line 2900000      Terms 0
..................................................File Line 3000000      Terms 0
..................................................File Line 3100000      Terms 0
..................................................File Line 3200000      Terms 0
..................................................File Line 3300000      Terms 0
..................................................File Line 3400000      Terms 0
..................................................File Line 3500000      Terms 0
..................................................File Line 3600000      Terms 0
..................................................File Line 3700000      Terms 0
..................................................File Line 3800000      Terms 0
..................................................File Line 3900000      Terms 0
..................................................File Line 4000000      Terms 0
..................................................File Line 4100000      Terms 0
..................................................File Line 4200000      Terms 0
..................................................File Line 4300000      Terms 0
..................................................File Line 4400000      Terms 0
..................................................File Line 4500000      Terms 0
..................................................File Line 4600000      Terms 0
..................................................File Line 4700000      Terms 0
..................................................File Line 4800000      Terms 0
..................................................File Line 4900000      Terms 0
..................................................File Line 5000000      Terms 0
..................................................File Line 5100000      Terms 0
..................................................File Line 5200000      Terms 0
..................................................File Line 5300000      Terms 0
..................................................File Line 5400000      Terms 0
..................................................File Line 5500000      Terms 0
..................................................File Line 5600000      Terms 0
..................................................File Line 5700000      Terms 0
..................................................File Line 5800000      Terms 0
..................................................File Line 5900000      Terms 0
..................................................File Line 6000000      Terms 0
..................................................File Line 6100000      Terms 0
..................................................File Line 6200000      Terms 0
..................................................File Line 6300000      Terms 0
..................................................File Line 6400000      Terms 0
..................................................File Line 6500000      Terms 0
..................................................File Line 6600000      Terms 0
..................................................File Line 6700000      Terms 0
..................................................File Line 6800000      Terms 0
..................................................File Line 6900000      Terms 0
..................................................File Line 7000000      Terms 0
..................................................File Line 7100000      Terms 0
..................................................File Line 7200000      Terms 0
..................................................File Line 7300000      Terms 0
..................................................File Line 7400000      Terms 0
..................................................File Line 7500000      Terms 0
..................................................File Line 7600000      Terms 0
..................................................File Line 7700000      Terms 0
..................................................File Line 7800000      Terms 0
..................................................File Line 7900000      Terms 0
..................................................File Line 8000000      Terms 0
..................................................File Line 8100000      Terms 0
..................................................File Line 8200000      Terms 0
..................................................File Line 8300000      Terms 0
..................................................File Line 8400000      Terms 0
..................................................File Line 8500000      Terms 0
..................................................File Line 8600000      Terms 0
..................................................File Line 8700000      Terms 0
..................................................File Line 8800000      Terms 0
.............File Line 8827152   Terms 0
Writing map of Cuis and Texts to \pathto\Umls2015.bsv

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Wednesday, September 16, 2015 4:00 PM
To: dev@ctakes.apache.org
Subject: RE: Fast Dictionary Update

Thank you!  I believe that was a change post 2011!  You should actually be ok with both SNOMEDCT
and SNOMEDCT_US in CtakesSources.txt

Cheers,
Sean

-----Original Message-----
From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com]
Sent: Wednesday, September 16, 2015 3:43 PM
To: dev@ctakes.apache.org
Subject: Re: Fast Dictionary Update

If this can helps, I had to replace 'SNOMEDCT' with 'SNOMEDCT_US' in CtakesSources.txt.

On Wed, Sep 16, 2015 at 2:33 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> I'm not sure that I understand your question.  As I sent it, the anat,
> snomed and rxnorm are not separate runs.  The args line I sent earlier
> is for a single run that will create a dictionary with snomed and
> rxnorm terms.  The anatomy tui list has a special use in correctly
> processing snomed codes.
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 3:27 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Ok, hopefully one last question.
>
> Based on your example everything runs, however the Anat and Snomed
> runs don't produce any valid CUIs but RXNorm does.  I'm not sure if
> this has anything to do with it but every UMLS source read is against MRSTY.
>
> Here's my command
>
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> /path/to/UMLS/META -fd ./data/tiny -atui
> ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt
> -ol \path\to\file\Umls2015.bsv
>
> Any suggestions?
>
> Thanks again,
> Brandon
>
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 3:05 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Yes, that will make the rare word dictionary in a memory-based hsql
> database - the same as the default for the dictionary-lookup-fast module.
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 2:42 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Thanks Sean, much appreciated.  To clarify the example below would
> create the dictionary for use for the rare word approach?
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 2:16 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Hi Brandon,
>
> I just checked in a bin/dictionarytool.zip It should have everything
> that you need (.jar, lib/, data/).
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 [args] Should do
> the trick.
>
> To recreate a 2015 version of the current ctakes dictionary, the
> arguments
> are:
> -umls my/path/to/2015AA/META -fd ./data/tiny -atui
> ./data/tiny/CtakesAnatTuis.txt -tui ./data/tiny/CtakesSnomedTuis.txt
> -db
> jdbc:hsqldb:file:my/path/to/snorx2015 -tbl CUI_TERMS
>
> Create my/path/to/snorx2015 by copying
> resources/memdbtemplate/ctakesumls.properties to
> my/path/to/snorx2015.properties   - there is a resources/README about this.
>
> Before populating a DB, I usually do a trial run first, writing to a
> flat file.  Replace "-db ... -tbl ..." with "-ol my/path/to/testout.bsv"
>
>
> Sean
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 1:49 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Hi Sean,
>
> That'd be great.
>
> I think I'm building it incorrectly because after I build the jar and
> try to run specifying DictionaryCreator2 as the main class it says it
> can't find it.  I'm not too familiar with Java and building
> projects/jars so it could be my ignorance causing the problem.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Wednesday, September 16, 2015 1:45 PM
> To: dev@ctakes.apache.org
> Subject: RE: Fast Dictionary Update
>
> Hi Brandon,
>
> I can send you a jar or commit one pre-built.  What goes wrong when
> you try to build the tool?
>
> Sean
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Wednesday, September 16, 2015 1:23 PM
> To: 'dev@ctakes.apache.org'
> Subject: Fast Dictionary Update
>
> Does someone have the DictionaryTool jar available?  I'm having
> trouble creating the jar file from the project and would like to be
> able to create an updated UMLS fast dictionary for 2015.
>
> Thanks,
> Brandon
>
>
> IMPORTANT WARNING: The information in this message (and the documents
> attached to it, if any) is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this message by
> anyone else is unauthorized. If you are not the intended recipient,
> any disclosure, copying, distribution or any action taken, or omitted
> to be taken, in reliance on it is prohibited and may be unlawful. If
> you have received this message in error, please delete all electronic
> copies of this message (and the documents attached to it, if any),
> destroy any hard copies you may have created and notify me immediately by replying to
this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard
> Protected Health Information and other confidential data contained in
> external e-mail messages. If email is encrypted, the recipient will
> receive an e-mail instructing them to sign on to the Geisinger Health
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>
>
>
>

Mime
View raw message