ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: specificity in selecting EntityMentions when using AggregatePlaintextUMLSProcessor
Date Sat, 28 Sep 2013 03:26:01 GMT
James,
Obviously it would be best to customize the code and/or the dictionary for
your particular case.
But if you want to try something that will work without any code changes,
you can try the below in your LookupDesc_Db.xml
Essentially, what it will do is take advantage of the fact the the
UmlsToSnomedDbConsumerImpl will allow you to specify an SQL statement that
maps the CUI's to Codes.  Couple by the fact that there already is a table
called umls_ms_2011ab which contains the codes and cui's from many
different sources including ICD9CM.
What you could do is just reuse the table as the mapping table as well and
specify the source such as:
select code from umls_ms_2011ab where cui=? and sourcetype='ICD9CM'

(The downside is that I don't think there is a index on sourcetype so
performance may suck).
I've attached an example to normalize to ICD9CM codes instead of SNOMEDCT.

<lookupConsumer className=
"org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl">

  <properties>

  <property key="codingScheme" value="ICD9CM"/>

  <property key="cuiMetaField" value="cui"/>

  <property key="tuiMetaField" value="tui"/>

  <property key="anatomicalSiteTuis" value=
"T021,T022,T023,T024,T025,T026,T029,T030"/>

  <property key="procedureTuis" value="T059,T060,T061"/>

  <property key="disorderTuis" value=
"T019,T020,T037,T046,T047,T048,T049,T050,T190,T191"/>

  <property key="findingTuis" value=
"T033,T034,T040,T041,T042,T043,T044,T045,T046,T056,T057,T184"/>

  <property key="dbConnExtResrcKey" value="DbConnection"/>

  <property key="mapPrepStmt" value="select code from umls_ms_2011ab where
cui=? and sourcetype='ICD9CM'"/>

  </properties>

 </lookupConsumer>


On Fri, Sep 27, 2013 at 9:58 PM, Pei Chen <chenpei@apache.org> wrote:

> James,
> One can try the NamedEntityLookupConsumerImpl instead of
> UmlsToSnomedDbConsumerImpl that will it will not filter out CUI's that only
> contain SNOMED codes.
> Will you need to preserve the TUI?  One thing is that
> NamedEntityLookupConsumerImpl will return back all of the hits, except that
> it'll create OntologyConcepts (w/o TUI's) instead of UMLSConcepts.  Perhaps
> we should make the NamedEntityLookupConsumerImpl a bit more general.
>
> --Pei
>
>
> On Fri, Sep 27, 2013 at 8:29 PM, Vogel, James <JVogel@activehealth.net>wrote:
>
>> I now see that I use a query on umls_ms_2011ab where sourcetype =
>> 'ICD9CM'.  Is there a way to use an existing AE or class to add additional
>> ICD9CM annotations / concepts or do I change the code in consumeHits() or
>> getSnomedCodes()?
>>
>> -----Original Message-----
>> From: Vogel, James
>> Sent: Friday, September 27, 2013 6:30 PM
>> To: dev@ctakes.apache.org
>> Subject: RE: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> Is anyone able to provide any more detailed guidance on what I'd need to
>> change to add the ICD9 codes as tags, e.g., where do I look for the tables
>> in the hsql database that would contain the ICD9 data?
>>
>> Thanks.
>>
>> -----Original Message-----
>> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
>> Sent: Monday, September 16, 2013 7:25 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>>
>> James,
>> I haven't done it myself, so I don't know exactly how the config
>> changes, but I know roughly where to look.  In the LookupDesc_Db.xml,
>> the <lookupBinding> tag with the idRef = DICT_UMLS_MS. Then look under
>> the <lookupConsumer> section, and you'll see the codingScheme is SNOMED.
>> I believe this is where the actual dictionary filtering is done. There
>> is also a consumer class called
>> org.apache.ctakes.dictionary.lookup.ae.UmlsToSnomedDbConsumerImpl and a
>> mapPrepStmt field with a SQL query that might need changing. That is
>> where I would start looking, I'm not sure whether you would need to
>> write a new consumer class, and what values the codingScheme field can
>> take, but hopefully this helps you get started until someone else chimes
>> in with more detailed info!
>>
>> Tim
>>
>> On 09/15/2013 08:39 PM, Vogel, James wrote:
>> > Any more guidance you can give about the nature of the changes to the
>> config and impl that would need to be made to get the ICD9 codes?
>> >
>> > -----Original Message-----
>> > From: Pei Chen [mailto:chenpei@apache.org]
>> > Sent: Wednesday, September 04, 2013 1:02 PM
>> > To: dev@ctakes.apache.org
>> > Subject: Re: specificity in selecting EntityMentions when using
>> AggregatePlaintextUMLSProcessor
>> >
>> > Ted,
>> >
>> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> > familiar> with how to access that information: In the example I've
>> > described below,
>> >
>> >> where would I locate the ICD9 for a specific entity?
>> > Even though ICD9 is include in the lookup, IRRC, cTAKES by default is
>> > configured[1] only returns/stores concepts [2] that have a SNOMEDCT
>> code or
>> > RxNorm code.
>> >
>> > [1]
>> >
>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
>> >
>> > [2]
>> >
>> http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/ae/UmlsToSnomedConsumerImpl.java
>> >
>> >  If you would like it to return ICD9 codes, one would need to
>> > modify/configure the above...
>> >
>> > --Pei
>> >
>> >
>> > On Wed, Sep 4, 2013 at 11:55 AM, Assur, Ted
>> > <Theodore.Assur@providence.org>wrote:
>> >
>> >> Thanks for looking into this, it's been puzzling me.
>> >>
>> >> On another note, I know the cTAKES dictionary uses ICD9, but I'm not
>> >> familiar with how to access that information: In the example I've
>> described
>> >> below, where would I locate the ICD9 for a specific entity?
>> >>
>> >> Thank you
>> >>
>> >> Ted
>> >>
>> >> -----Original Message-----
>> >> From: Pei Chen [mailto:chenpei@apache.org]
>> >> Sent: Tuesday, September 03, 2013 7:13 PM
>> >> To: dev@ctakes.apache.org
>> >> Subject: Re: specificity in selecting EntityMentions when using
>> >> AggregatePlaintextUMLSProcessor
>> >>
>> >> You're right, it should have gotten "CIN I"- that's a strange one,
>> >> probably needs to be debugged/looked into further...
>> >>
>> >> On Tue, Sep 3, 2013 at 10:05 PM, Miller, Timothy <
>> >> Timothy.Miller@childrens.harvard.edu> wrote:
>> >>> Ah. So it will get
>> >>> CIN 2 (in SNOMED)
>> >>> CIN III (in SNOMED)
>> >>> CIN 3 (in SNOMED)
>> >>>
>> >>> but the rest are not in SNOMED?
>> >>>
>> >>> I wonder why it doesn't get CIN I? It looks like that exists in SNOMED
>> >>> (though I don't fully understand what all the symbols mean in the umls
>> >>> browser).
>> >>>
>> >>>> CIN I - Cervical intraepithelial neoplasia 1
>> >>>> [A3002690/SNOMEDCT/SY/285836003]
>> >>>
>> >>> On 09/03/2013 09:55 PM, Pei Chen wrote:
>> >>>> It has the correct parse (POS, chunks, and lookupwindow)- but some
of
>> >>>> the terms do not exist in SNOMED- CIN 2 - Cervical intraepithelial
>> >>>> neoplasia 2 [A3002688/SNOMEDCT/SY/285838002] exists but not CIN
II.
>> >>>> CIN III [A3333965/SNOMEDCT/SY/20365006] also exists that's why it
was
>> >>>> able to perform the lookup successfully.
>> >>>> Note that CIN II synonyms do exist in other umls thersauses such
as
>> >>>> MEDCIN, CCPSS though.  However, the bundled cTAKES dictionaries
only
>> >>>> contain (MeSH, SNOMEDCT, RxNORM, NCI, ICD9) IRRC.
>> >>>>
>> >>>> --Pei
>> >>>>
>> >>>> On Tue, Sep 3, 2013 at 9:44 PM, Miller, Timothy
>> >>>> <Timothy.Miller@childrens.harvard.edu> wrote:
>> >>>>> That is a good question, Ted!
>> >>>>>
>> >>>>> I tried it with a simple context: "The patient has a CIN III."
I'm
>> >>>>> not sure if that is a correct context but I was able to duplicate
>> >>>>> your findings. (Finds a CUI for CIN III but not if you change
it to
>> >>>>> CIN II)
>> >>>>>
>> >>>>> My first thought was that it is the chunker. But the chunker
seems
>> >>>>> to get it right, as CIN II and CIN III are both called NPs,
and
>> >>>>> similarly the LookupWindowAnnotator handles them both identically.
>> >>>>> So that suggests it is a problem with the actual lookup of the
>> >>>>> tokens in the LookupWindow.
>> >>>>>
>> >>>>> That's all I can do for now but maybe someone else who knows
more
>> >>>>> about its behavior offhand will have an idea.
>> >>>>>
>> >>>>> Tim
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 09/03/2013 08:24 PM, Assur, Ted wrote:
>> >>>>>> I'm trying to understand what would prevent the
>> >> AggregatePlaintextUMLSProcessor AE from correctly parsing specific
>> problems
>> >> that are defined in the UMLS version used by cTAKES.
>> >>>>>> For example,
>> >>>>>> CIN (Cervical Intraepithelial Neoplasia) in its general
usage is
>> >> parsed out as UMLS CUI C0206708.
>> >>>>>> CIN comes in 3 grades, 1, 2 and 3. Sometimes this is reported
with
>> >> Roman Numerals, I,II, and III.
>> >>>>>> cTAKES correctly identifies "CIN 3" and "CIN III" with UMLS
CUI
>> >> C0851140: "Carcinoma in situ of uterine cervix."
>> >>>>>> However, I cannot get it to recognize CIN 1, CIN I, CIN
2, or CIN
>> II
>> >> as their correct concepts, "Cervical intraepithelial neoplasia grade
>> 1" and
>> >> "Cervical intraepithelial neoplasia grade 2" respectively.
>> >>>>>> Is there a way to tune the detection of UMLS concepts?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --------------------------------------------
>> >>>>>> Ted Assur
>> >>>>>> IT Solutions Architect for Cancer Research Providence Health
&
>> >>>>>> Services ted.assur@providence.org
>> >>>>>> 503-215-6476
>> >>>>>>
>> >>>>>> Crede, ut intelligas.
>> >>>>>> Intellego, ut credam.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>   ________________________________
>> >>>>>>
>> >>>>>> This message is intended for the sole use of the addressee,
and may
>> >> contain information that is privileged, confidential and exempt from
>> >> disclosure under applicable law. If you are not the addressee you are
>> >> hereby notified that you may not use, copy, disclose, or distribute to
>> >> anyone the message or any information contained in the message. If you
>> have
>> >> received this message in error, please immediately advise the sender by
>> >> reply email and delete this message.
>> >>
>> >> ________________________________
>> >>
>> >> This message is intended for the sole use of the addressee, and may
>> >> contain information that is privileged, confidential and exempt from
>> >> disclosure under applicable law. If you are not the addressee you are
>> >> hereby notified that you may not use, copy, disclose, or distribute to
>> >> anyone the message or any information contained in the message. If you
>> have
>> >> received this message in error, please immediately advise the sender by
>> >> reply email and delete this message.
>> >>
>> >>
>> > IMPORTANT WARNING: Information contained in this email is intended for
>> the use of the individual to whom it is addressed, and may contain
>> information that is privileged, confidential, and exempt from disclosure
>> under applicable law. If you are not the intended recipient, or the
>> employee or agent responsible for delivering the message to the intended
>> recipient, you are hereby notified that any dissemination, distribution, or
>> copying of this communication is STRICTLY FORBIDDEN. If you have received
>> this communication in error, please notify us immediately by return email
>> and delete this document. Thank you.
>> >
>>
>>
>> IMPORTANT WARNING: Information contained in this email is intended for
>> the use of the individual to whom it is addressed, and may contain
>> information that is privileged, confidential, and exempt from disclosure
>> under applicable law. If you are not the intended recipient, or the
>> employee or agent responsible for delivering the message to the intended
>> recipient, you are hereby notified that any dissemination, distribution, or
>> copying of this communication is STRICTLY FORBIDDEN. If you have received
>> this communication in error, please notify us immediately by return email
>> and delete this document. Thank you.
>>
>
>

Mime
View raw message