ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: How to update cTAKES so that new top level categories come out based on local dictionary?
Date Thu, 08 Oct 2015 13:34:46 GMT
Hi Chris,

Just off-the-cuff have you tried just using the relative path "org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv"
?

Relative paths within the $CLASSPATH should work in trunk, but perhaps not until the next
release?  I haven't tested recently (should add a junit ...).

Sean

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Wednesday, October 07, 2015 11:34 PM
To: dev@ctakes.apache.org
Subject: Re: How to update cTAKES so that new top level categories come out based on local
dictionary?

Hi Sean,



One more question too:



So, I put the bsv files in the resources directory as part of my

Apache cTAKES 3.2.2 distribution:



/usr/local/apache-ctakes-3.2.2-bin/resources



underneath:

org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv



and I referenced it like this (as an example just including the dictionary

def, path is same for the concept factory):

      <dictionary>

         <name>CustomCuiRareWord</name>

         

<implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRare

WordDictionary</implementationName>

         <properties>

            <property key="bsvPath"

value="resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.

bsv"/>

         </properties>

      </dictionary>





Here’s what I see in the logs:



<snip>

7 Oct 2015 20:31:01  INFO AbstractJCasTermAnnotator - Exclusion tagset

loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN

VBP VBZ WDT WP WPS WRB

07 Oct 2015 20:31:01  INFO AbstractJCasTermAnnotator - Using minimum term

text span: 3

07 Oct 2015 20:31:01  INFO DictionaryDescriptorParser - Parsing dictionary

specifications: 

/data/hosts/web-dev.aws-redda.celgene.com/local/cdeploy/shangridocs/shangri

docs-tika/ctakes/apache-ctakes-3.2.2/resources/org/apache/ctakes/dictionary

/lookup/fast/cTakesHsql.xml

07 Oct 2015 20:31:01  INFO UmlsUserApprover - Checking UMLS Account at

https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7c3axagf70xUhorOIr0klz3RYoejn3F4syQ1EdsLJJs&s=ykk9YhvbJfoa2ZEurQdQFSs6E-ta4ecG4vnGauVMqk0&e=
 for user chrismattmann:

..

07 Oct 2015 20:31:02  INFO UmlsUserApprover -   UMLS Account at

https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7c3axagf70xUhorOIr0klz3RYoejn3F4syQ1EdsLJJs&s=ykk9YhvbJfoa2ZEurQdQFSs6E-ta4ecG4vnGauVMqk0&e=
 for user chrismattmann

has been validated

07 Oct 2015 20:31:02  INFO JdbcConnectionFactory - Connecting to

jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakess

norx/ctakessnorx:

......

07 Oct 2015 20:31:04  INFO JdbcConnectionFactory -  Database connected

07 Oct 2015 20:31:04 ERROR BsvRareWordDictionary -

resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv

(No such file or directory)

07 Oct 2015 20:31:04 ERROR BsvConceptFactory -

resources/org/apache/ctakes/dictionary/lookup/fast/example/bsv/file.bsv

(No such file or directory)

</snip>





I’ve tried all variants, e.g., in the cTakesHsql.xml file I see resources

as a prefix for the

hsqldb file, so I tried that too, and it doesn’t work. I’ve also tried it

without resources as a prefix,

that doesn’t work too.



Any ideas?



Cheers,

Chris





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: chris.a.mattmann@nasa.gov

WWW:  https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7c3axagf70xUhorOIr0klz3RYoejn3F4syQ1EdsLJJs&s=nkdG8JycZip8J53zImoivYI6LCntPkf3zGiuUuSTlfo&e=


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++











-----Original Message-----

From: "Finan, Sean" <Sean.Finan@childrens.harvard.edu>

Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>

Date: Tuesday, October 6, 2015 at 2:04 PM

To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>

Subject: RE: How to update cTAKES so that new top level categories come

out based on local dictionary?



>Hi Chris,

>

>I use bsv to denote "bar separated value" - also known as "pipe

>delimited".  I typically name the files with a ".bsv" extension, and they

>are just plain old boring ascii flat files.

>There should be multiple columns in the bsv file separated by the '|'

>character.  The following are all valid per-line formats:

>CUI|text

>CUI|TUI|text

>CUI|TUI|text|preferredText

>It doesn't matter which format you choose, the parser will auto-detect

>per-line.  Starting a line with "//" or "#" indicates that it is a

>comment and should be ignored.

>

>

>To add the bsv dictionary to your pipeline you just need to edit the

>resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml file

>and add a couple new sections.

>Within the <dictionaries> section, add:

>      <dictionary>

>         <name>CustomCuiRareWord</name>

>         

><implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.BsvRar

>eWordDictionary</implementationName>

>         <properties>

>            <property key="bsvPath"

>value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/>

>         </properties>

>      </dictionary>

>Within the <conceptFactories> section, add:

>      <conceptFactory>

>         <name>CustomCuiConcept</name>

>         

><implementationName>org.apache.ctakes.dictionary.lookup2.concept.BsvConcep

>tFactory</implementationName>

>         <properties>

>            <property key="bsvPath"

>value="org/apache/ctakes/dictionary/fast/example/custom_cui_tui_bsv.bsv"/>

>         </properties>

>      </conceptFactory>

>Within the <dictionaryConceptPairs> section, add:

>      <dictionaryConceptPair>

>         <name>CustomPair</name>

>         <dictionaryName>CustomCuiRareWord</dictionaryName>

>         <conceptFactoryName>CustomCuiConcept</conceptFactoryName>

>      </dictionaryConceptPair>

>You can change all of the [Custom**] names, and you should obviously

>point to the actual path of your bsv file.

>

>In addition to detecting your column count/style, upon loading the text

>will be lower-cased and tokenized and the terms will be indexed by rare

>word (for fast lookup).   Also, you do not need to write out the whole

>"C1234567" or "T123" cui tui codes.  The default prefix characters and

>padding zeros are automatically added.   Cuis "1" "01" "C1" "C01" will

>all be stored as "C0000001" and Tuis are handled likewise.  If you have

>custom cuis then it will honor non-"C" prefixes and still pad zeros

>automatically based upon the longest entry.  For instance, if your bsv

>has "CAM1", "CAM12" and "CAM12345" then the stored custom cuis should be

>"CAM00001", "CAM00012" and "CAM13245".

>

>I think that is about all that there is to it ...

>

>Sean

>

>-----Original Message-----

>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]

>Sent: Tuesday, October 06, 2015 4:31 PM

>To: dev@ctakes.apache.org

>Subject: Re: How to update cTAKES so that new top level categories come

>out based on local dictionary?

>

>Hi Sean,

>

>

>

>Thanks so much for your reply. For now I don’t care about the secondary

>

>codes and I for sure have < 1000 terms. Can you tell me how to wire up

>

>the BSV file by editing specific places in cTAKES? What specific commands

>

>should I run or what format should the BSV file look like? I must admit

>

>I have never heard of BSV files and the Internet varies on these between

>

>Bluespec System Verilog and BASIC bsave files.

>

>

>

>Then after I make the BSV file, what steps next? Recompile cTAKES? Can

>

>I take the BSV file and simply point to it from a binary installation of

>

>cTAKES? Thank you!

>

>

>

>Cheers,

>

>Chris

>

>

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>Chris Mattmann, Ph.D.

>

>Chief Architect

>

>Instrument Software and Science Data Systems Section (398)

>

>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

>

>Office: 168-519, Mailstop: 168-527

>

>Email: chris.a.mattmann@nasa.gov

>

>WWW:  

>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Ematt

>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZst

>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bLdoNVceobXShsqfGFdPDKSiq2WNSUbGDHdvmrf

>Mj10&s=CXhGiFUuPnSekOe4GnsuxPOgYHbNp-hAnOD8jmB-lgc&e=

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>Adjunct Associate Professor, Computer Science Department

>

>University of Southern California, Los Angeles, CA 90089 USA

>

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>

>

>

>

>

>

>

>

>

>

>-----Original Message-----

>

>From: "Finan, Sean" <Sean.Finan@childrens.harvard.edu>

>

>Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>

>

>Date: Tuesday, October 6, 2015 at 8:05 AM

>

>To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>

>

>Subject: RE: How to update cTAKES so that new top level categories come

>

>out based on local dictionary?

>

>

>

>>Hi Chris,

>

>>

>

>>There are a few ways to do this:

>

>>1.  Create an additional dictionary with the terms of interest and add it

>

>>as a source

>

>>2.  Create a new dictionary hsqldb that contains everything, old and new

>

>>3.  Add to the existing hsqldb dictionary

>

>>

>

>>The best approach for you would probably depend upon

>

>>1.  How many new terms you have

>

>>2.  Whether or not you desire additional codes, i.e. rxnorm, snomed

>

>>

>

>>If you don't have many new terms (<1000) and you don't care about

>

>>secondary codes then the easiest thing would be to create a BSV file with

>

>>the new terms and cuis.

>

>>

>

>>If you have a lot of new terms or do care about secondary codes, then a

>

>>less facile solution would be to create a new hsqldb with only the new

>

>>info or a complete replacement with new and old/existing terms.  Of the

>

>>two hsql options creating a new all-inclusive database would probably be

>

>>easier unless you want to learn the ins and outs of hsql.  If all of the

>

>>terms are in the umls, then the new all-inclusive hsqldb would definitely

>

>>be easiest (I think) as you could use the dictionary tool to create it.

>

>>

>

>>If you let me know your exact situation then I may be able to better

>

>>expound.

>

>>

>

>>Sean

>

>>

>

>>-----Original Message-----

>

>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]

>

>>Sent: Monday, October 05, 2015 7:36 PM

>

>>To: dev@ctakes.apache.org

>

>>Subject: How to update cTAKES so that new top level categories come out

>

>>based on local dictionary?

>

>>

>

>>Hi cTAKES team,

>

>>

>

>>

>

>>

>

>>Hope you’re well! I had a quick question. I was wondering if someone

>

>>

>

>>could provide me a step-by-step guide to updating cTAKES to be based

>

>>

>

>>off a local dictionary, so that in addition to e.g.,

>

>>

>

>>

>

>>

>

>>ProceduralMention

>

>>

>

>>  Value1 position etc

>

>>

>

>>  Value2 position etc

>

>>

>

>>

>

>>

>

>>MedicationMention

>

>>

>

>>  Value1 position etc

>

>>

>

>>  Value2 position etc

>

>>

>

>>

>

>>

>

>>

>

>>

>

>>NewTopLevelCategoryFromMyDictionary

>

>>

>

>>  FoundValue1 position etc

>

>>

>

>>  FoundValue2 position etc

>

>>

>

>>

>

>>

>

>>

>

>>

>

>>I realize this has something to do with updating the annotation

>

>>

>

>>descriptions etc in XML, so if I someone just could tell me what

>

>>

>

>>to update I’d really appreciate it.

>

>>

>

>>

>

>>

>

>>Thank you!

>

>>

>

>>

>

>>

>

>>Cheers,

>

>>

>

>>Chris

>

>>

>

>>

>

>>

>

>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>>

>

>>Chris Mattmann, Ph.D.

>

>>

>

>>Chief Architect

>

>>

>

>>Instrument Software and Science Data Systems Section (398)

>

>>

>

>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

>

>>

>

>>Office: 168-519, Mailstop: 168-527

>

>>

>

>>Email: chris.a.mattmann@nasa.gov

>

>>

>

>>WWW:  

>

>>https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emat

>>t

>

>>mann_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZs

>>t

>

>>TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=MEZE0aOE5pBHul1QA3A9xWbiwS6LzZaIq2rMw9

>>a

>

>>jiB0&s=cvi79MY1__guvBRsQmsYfc39lqPvv-1Yx1Pg8g5B0QU&e=

>

>>

>

>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>>

>

>>Adjunct Associate Professor, Computer Science Department

>

>>

>

>>University of Southern California, Los Angeles, CA 90089 USA

>

>>

>

>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

>

>>

>

>>

>

>>

>

>>

>

>>

>

>>

>

>>

>

>

>



Mime
View raw message