ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Date Mon, 30 Sep 2013 21:27:50 GMT
Hi James,
Glad you were able to make cTAKES work for your use case.  

The UMLS subset that is currently included in the resources should be:
*	International Classification of Diseases, Ninth Revision, Clinical Modification, 2012	ICD9CM_2012
ICD9CM	ENG	0	20997
*	International Classification of Diseases, Ninth Revision, Clinical Modification, Metathesaurus
additional entry terms, 2012	MTHICD9_2012	ICD9CM	ENG	0	16304
*	Medical Subject Headings, 2012_2011_09_09	MSH2012_2011_09_09	MSH	ENG	0	321367
*	NCI Thesaurus, 2011_02D	NCI2011_02D	NCI	ENG	0	90135
*	SNOMED Clinical Terms, 2011_07_31	SNOMEDCT_2011_07_31	SNOMEDCT	ENG	9	324494

And also RxNorm for the rxnorm_index folder.
(I think there was a readme about it, if not, let's at least add it to the User FAQ's?)

--Pei

> -----Original Message-----
> From: Vogel, James [mailto:JVogel@activehealth.net]
> Sent: Monday, September 30, 2013 11:41 AM
> To: dev@ctakes.apache.org
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> That worked and I see how I can change the code to do both SNOMED and
> ICD9.
> I added an index by doing: CREATE INDEX 'umls_ms_2011ab_cui' ON
> umls_ms_2011ab (cui);  I needed to change the database from 'read-only', is
> that going to cause any other problems?
> 
> What subset of ICD9 is in the dictionary?
> 
> From: Pei Chen [mailto:chenpei@apache.org]
> Sent: Friday, September 27, 2013 11:26 PM
> To: dev@ctakes.apache.org
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> James,
> Obviously it would be best to customize the code and/or the dictionary for
> your particular case.
> But if you want to try something that will work without any code changes,
> you can try the below in your LookupDesc_Db.xml Essentially, what it will do
> is take advantage of the fact the the UmlsToSnomedDbConsumerImpl will
> allow you to specify an SQL statement that maps the CUI's to Codes.  Couple
> by the fact that there already is a table called umls_ms_2011ab which
> contains the codes and cui's from many different sources including ICD9CM.
> What you could do is just reuse the table as the mapping table as well and
> specify the source such as:
> select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'
> 
> (The downside is that I don't think there is a index on sourcetype so
> performance may suck).
> I've attached an example to normalize to ICD9CM codes instead of
> SNOMEDCT.
> <lookupConsumer
> className="org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbCons
> umerImpl">
> <properties>
> <property key="codingScheme" value="ICD9CM"/> <property
> key="cuiMetaField" value="cui"/> <property key="tuiMetaField"
> value="tui"/> <property key="anatomicalSiteTuis"
> value="T021,T022,T023,T024,T025,T026,T029,T030"/>
> <property key="procedureTuis" value="T059,T060,T061"/> <property
> key="disorderTuis"
> value="T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>
> <property key="findingTuis"
> value="T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/>
> <property key="dbConnExtResrcKey" value="DbConnection"/> <property
> key="mapPrepStmt" value="select code from umls_ms_2011ab where cui=?
> and sourcetype='ICD9CM'"/> </properties> </lookupConsumer>
> 
> On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen
> <chenpei@apache.org<mailto:chenpei@apache.org>> wrote:
> James,
> One can try the NamedEntityLookupConsumerImpl instead of
> UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
> contain SNOMED codes.
> Will you need to preserve the TUI?  One thing is that
> NamedEntityLookupConsumerImpl will return back all of the hits, except that
> it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
> we should make the NamedEntityLookupConsumerImpl a bit more general.
> 
> --Pei
> 
> On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James
> <JVogel@activehealth.net<mailto:JVogel@activehealth.net>> wrote:
> I now see that I use a query on umls_ms_2011ab where sourcetype =
> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
> ICD9CM annotations / concepts or do I change the code in consumeHits() or
> getSnomedCodes()?
> 
> -----Original Message-----
> From: Vogel, James
> Sent: Friday, September 27, 2013 6:30 PM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: RE: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> Is anyone able to provide any more detailed guidance on what I'd need to
> change to add the ICD9 codes as tags, e.g., where do I look for the tables in
> the hsql database that would contain the ICD9 data?
> 
> Thanks.
> 
> -----Original Message-----
> From: Miller, Timothy
> [mailto:Timothy.Miller@childrens.harvard.edu<mailto:Timothy.Miller@childr
> ens.harvard.edu>]
> Sent: Monday, September 16, 2013 7:25 AM
> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> Subject: Re: specificity in selecting EntityMentions when using
> AggregatePlaintextUMLSProcessor
> 
> James,
> I haven't done it myself, so I don't know exactly how the config changes, but
> I know roughly where to look.  In the LookupDesc_Db.xml, the
> <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under the
> <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
> I believe this is where the actual dictionary filtering is done. There is also a
> consumer class called
> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl
> and a mapPrepStmt field with a SQL query that might need changing. That is
> where I would start looking, I'm not sure whether you would need to write a
> new consumer class, and what values the codingScheme field can take, but
> hopefully this helps you get started until someone else chimes in with more
> detailed info!
> 
> Tim
> 
> On 09/15/2013 08:39 PM, Vogel, James wrote:
> > Any more guidance you can give about the nature of the changes to the
> config and impl that would need to be made to get the ICD9 codes?
> >
> > -----Original Message-----
> > From: Pei Chen
> [mailto:chenpei@apache.org<mailto:chenpei@apache.org>]
> > Sent: Wednesday, September 04, 2013 1:02 PM
> > To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> > Subject: Re: specificity in selecting EntityMentions when using
> > AggregatePlaintextUMLSProcessor
> >
> > Ted,
> >
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> > familiar> with how to access that information: In the example I've
> > described below,
> >
> >> where would I locate the ICD9 for a specific entity?
> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT
> > code or RxNorm code.
> >
> > [1]
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
> >
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_
> > Db.xml
> >
> > [2]
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/
> >
> src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedCon
> su
> > merImpl.java
> >
> >  If you would like it to return ICD9 codes, one would need to
> > modify/configure the above...
> >
> > --Pei
> >
> >
> > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
> >
> <Theodore.Assur@providence.org<mailto:Theodore.Assur@providence.org
> >>wrote:
> >
> >> Thanks for looking into this, it's been puzzling me.
> >>
> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
> >> familiar with how to access that information: In the example I've
> >> described below, where would I locate the ICD9 for a specific entity?
> >>
> >> Thank you
> >>
> >> Ted
> >>
> >> -----Original Message-----
> >> From: Pei Chen
> [mailto:chenpei@apache.org<mailto:chenpei@apache.org>]
> >> Sent: Tuesday, September 03, 2013 7:13 PM
> >> To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
> >> Subject: Re: specificity in selecting EntityMentions when using
> >> AggregatePlaintextUMLSProcessor
> >>
> >> You're right, it should have gotten "CIN I"- that's a strange one,
> >> probably needs to be debugged/looked into further...
> >>
> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
> >>
> Timothy.Miller@childrens.harvard.edu<mailto:Timothy.Miller@childrens.har
> vard.edu>> wrote:
> >>> Ah. So it will get
> >>> CIN 2 (in SNOMED)
> >>> CIN III (in SNOMED)
> >>> CIN 3 (in SNOMED)
> >>>
> >>> but the rest are not in SNOMED?
> >>>
> >>> I wonder why it doesn't get CIN I? It looks like that exists in
> >>> SNOMED (though I don't fully understand what all the symbols mean in
> >>> the umls browser).
> >>>
> >>>> CIN I - Cervical intraepithelial neoplasia 1
> >>>> [A3002690/SNOMEDCT/SY/285836003]
> >>>
> >>> On 09/03/2013 09:55 PM, Pei Chen wrote:
> >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some
> >>>> of the terms do not exist in SNOMED- CIN 2 - Cervical
> >>>> intraepithelial neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists
> but not CIN II.
> >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it
> >>>> was able to perform the lookup successfully.
> >>>> Note that CIN II synonyms do exist in other umls thersauses such as
> >>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries
> >>>> only contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
> >>>>
> >>>> --Pei
> >>>>
> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
> >>>>
> <Timothy.Miller@childrens.harvard.edu<mailto:Timothy.Miller@childrens.ha
> rvard.edu>> wrote:
> >>>>> That is a good question, Ted!
> >>>>>
> >>>>> I tried it with a simple context: "The patient has a CIN III." I'm
> >>>>> not sure if that is a correct context but I was able to duplicate
> >>>>> your findings. (Finds a CUI for CIN III but not if you change it
> >>>>> to CIN II)
> >>>>>
> >>>>> My first thought was that it is the chunker. But the chunker seems
> >>>>> to get it right, as CIN II and CIN III are both called NPs, and
> >>>>> similarly the LookupWindowAnnotator handles them both identically.
> >>>>> So that suggests it is a problem with the actual lookup of the
> >>>>> tokens in the LookupWindow.
> >>>>>
> >>>>> That's all I can do for now but maybe someone else who knows more
> >>>>> about its behavior offhand will have an idea.
> >>>>>
> >>>>> Tim
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
> >>>>>> I'm trying to understand what would prevent the
> >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific
> >> problems that are defined in the UMLS version used by cTAKES.
> >>>>>> For example,
> >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general usage
is
> >> parsed out as UMLS CUI C0206708.
> >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported
> >>>>>> with
> >> Roman Numerals, I,II, and III.
> >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS
CUI
> >> C0851140: "Carcinoma in situ of uterine cervix."
> >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN 2, or
CIN
> >>>>>> II
> >> as their correct concepts, "Cervical intraepithelial neoplasia grade
> >> 1" and "Cervical intraepithelial neoplasia grade 2" respectively.
> >>>>>> Is there a way to tune the detection of UMLS concepts?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --------------------------------------------
> >>>>>> Ted Assur
> >>>>>> IT Solutions Architect for Cancer Research Providence Health
&
> >>>>>> Services
> >>>>>> ted.assur@providence.org<mailto:ted.assur@providence.org>
> >>>>>> 503-215-6476<tel:503-215-6476>
> >>>>>>
> >>>>>> Crede, ut intelligas.
> >>>>>> Intellego, ut credam.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>   ________________________________
> >>>>>>
> >>>>>> This message is intended for the sole use of the addressee,
and
> >>>>>> may
> >> contain information that is privileged, confidential and exempt from
> >> disclosure under applicable law. If you are not the addressee you are
> >> hereby notified that you may not use, copy, disclose, or distribute
> >> to anyone the message or any information contained in the message. If
> >> you have received this message in error, please immediately advise
> >> the sender by reply email and delete this message.
> >>
> >> ________________________________
> >>
> >> This message is intended for the sole use of the addressee, and may
> >> contain information that is privileged, confidential and exempt from
> >> disclosure under applicable law. If you are not the addressee you are
> >> hereby notified that you may not use, copy, disclose, or distribute
> >> to anyone the message or any information contained in the message. If
> >> you have received this message in error, please immediately advise
> >> the sender by reply email and delete this message.
> >>
> >>
> > IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.
> >
> 
> 
> IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.
> 
> 
> 
> ________________________________
> IMPORTANT WARNING: Information contained in this email is intended for
> the use of the individual to whom it is addressed, and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If you are not the intended recipient, or the employee
> or agent responsible for delivering the message to the intended recipient,
> you are hereby notified that any dissemination, distribution, or copying of
> this communication is STRICTLY FORBIDDEN. If you have received this
> communication in error, please notify us immediately by return email and
> delete this document. Thank you.

Mime
View raw message