ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: How to update cTAKES so that new top level categories come out based on local dictionary?
Date Tue, 06 Oct 2015 21:04:56 GMT
Hi Chris,

I use bsv to denote "bar separated value" - also known as "pipe delimited".  I typically name
the files with a ".bsv" extension, and they are just plain old boring ascii flat files.
There should be multiple columns in the bsv file separated by the '|' character.  The following
are all valid per-line formats:
CUI|text
CUI|TUI|text
CUI|TUI|text|preferredText
It doesn't matter which format you choose, the parser will auto-detect per-line.  Starting
a line with "//" or "#" indicates that it is a comment and should be ignored. 


To add the bsv dictionary to your pipeline you just need to edit the resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml
file and add a couple new sections.
Within the <dictionaries> section, add:
      <dictionary>
         <name>CustomCuiRareWord</name>
         <implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRareWordDictionary</implementationName>
         <properties>
            <property key="bsvPath" value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/>
         </properties>
      </dictionary>
Within the <conceptFactories> section, add:
      <conceptFactory>
         <name>CustomCuiConcept</name>
         <implementationName>org.apache.ctakes.dictionary.lookup2.concept.BsvConceptFactory</implementationName>
         <properties>
            <property key="bsvPath" value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/>
         </properties>
      </conceptFactory>
Within the <dictionaryConceptPairs> section, add:
      <dictionaryConceptPair>
         <name>CustomPair</name>
         <dictionaryName>CustomCuiRareWord</dictionaryName>
         <conceptFactoryName>CustomCuiConcept</conceptFactoryName>
      </dictionaryConceptPair>
You can change all of the [Custom**] names, and you should obviously point to the actual path
of your bsv file.

In addition to detecting your column count/style, upon loading the text will be lower-cased
and tokenized and the terms will be indexed by rare word (for fast lookup).   Also, you do
not need to write out the whole "C1234567" or "T123" cui tui codes.  The default prefix characters
and padding zeros are automatically added.   Cuis "1" "01" "C1" "C01" will all be stored as
"C0000001" and Tuis are handled likewise.  If you have custom cuis then it will honor non-"C"
prefixes and still pad zeros automatically based upon the longest entry.  For instance, if
your bsv has "CAM1", "CAM12" and "CAM12345" then the stored custom cuis should be "CAM00001",
"CAM00012" and "CAM13245".

I think that is about all that there is to it ...

Sean

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Tuesday, October 06, 2015 4:31 PM
To: dev@ctakes.apache.org
Subject: Re: How to update cTAKES so that new top level categories come out based on local
dictionary?

Hi Sean,



Thanks so much for your reply. For now I don’t care about the secondary

codes and I for sure have < 1000 terms. Can you tell me how to wire up

the BSV file by editing specific places in cTAKES? What specific commands

should I run or what format should the BSV file look like? I must admit

I have never heard of BSV files and the Internet varies on these between

Bluespec System Verilog and BASIC bsave files.



Then after I make the BSV file, what steps next? Recompile cTAKES? Can

I take the BSV file and simply point to it from a binary installation of

cTAKES? Thank you!



Cheers,

Chris



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: chris.a.mattmann@nasa.gov

WWW:  https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bLdoNVceobXShsqfGFdPDKSiq2WNSUbGDHdvmrfMj10&s=CXhGiFUuPnSekOe4GnsuxPOgYHbNp-hAnOD8jmB-lgc&e=


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++











-----Original Message-----

From: "Finan, Sean" <Sean.Finan@childrens.harvard.edu>

Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>

Date: Tuesday, October 6, 2015 at 8:05 AM

To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>

Subject: RE: How to update cTAKES so that new top level categories come

out based on local dictionary?



>Hi Chris,

>

>There are a few ways to do this:

>1.  Create an additional dictionary with the terms of interest and add it

>as a source

>2.  Create a new dictionary hsqldb that contains everything, old and new

>3.  Add to the existing hsqldb dictionary

>

>The best approach for you would probably depend upon

>1.  How many new terms you have

>2.  Whether or not you desire additional codes, i.e. rxnorm, snomed

>

>If you don't have many new terms (<1000) and you don't care about

>secondary codes then the easiest thing would be to create a BSV file with

>the new terms and cuis.

>

>If you have a lot of new terms or do care about secondary codes, then a

>less facile solution would be to create a new hsqldb with only the new

>info or a complete replacement with new and old/existing terms.  Of the

>two hsql options creating a new all-inclusive database would probably be

>easier unless you want to learn the ins and outs of hsql.  If all of the

>terms are in the umls, then the new all-inclusive hsqldb would definitely

>be easiest (I think) as you could use the dictionary tool to create it.

>

>If you let me know your exact situation then I may be able to better

>expound.

>

>Sean

>

>-----Original Message-----

>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]

>Sent: Monday, October 05, 2015 7:36 PM

>To: dev@ctakes.apache.org

>Subject: How to update cTAKES so that new top level categories come out

>based on local dictionary?

>

>Hi cTAKES team,

>

>

>

>Hope you’re well! I had a quick question. I was wondering if someone

>

>could provide me a step-by-step guide to updating cTAKES to be based

>

>off a local dictionary, so that in addition to e.g.,

>

>

>

>ProceduralMention

>

>  Value1 position etc

>

>  Value2 position etc

>

>

>

>MedicationMention

>

>  Value1 position etc

>

>  Value2 position etc

>

>

>

>

>

>NewTopLevelCategoryFromMyDictionary

>

>  FoundValue1 position etc

>

>  FoundValue2 position etc

>

>

>

>

>

>I realize this has something to do with updating the annotation

>

>descriptions etc in XML, so if I someone just could tell me what

>

>to update I’d really appreciate it.

>

>

>

>Thank you!

>

>

>

>Cheers,

>

>Chris

>

>

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>Chris Mattmann, Ph.D.

>

>Chief Architect

>

>Instrument Software and Science Data Systems Section (398)

>

>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

>

>Office: 168-519, Mailstop: 168-527

>

>Email: chris.a.mattmann@nasa.gov

>

>WWW:  

>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Ematt

>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZst

>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=MEZE0aOE5pBHul1QA3A9xWbiwS6LzZaIq2rMw9a

>jiB0&s=cvi79MY1__guvBRsQmsYfc39lqPvv-1Yx1Pg8g5B0QU&e=

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>Adjunct Associate Professor, Computer Science Department

>

>University of Southern California, Los Angeles, CA 90089 USA

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>

>

>

>

>

>



Mime
View raw message