ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, Richard A. [USA]" <lee_rich...@bah.com>
Subject RE: [External] Re: Problems with TUI filtering and other annotation omissions
Date Tue, 29 Apr 2014 19:57:06 GMT
Thank you for that pointer. Unfortunately, org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
does not have the missing annotations.

I noticed that a later post to this list asked a similar question concerning adding TUIs to
LookupDesc_Db.xml, and the answer was that the ctakes code in UmlsToSnomedConsumerImpl only
looks for certain TUI “groups”. So that would explain why my shot-in-the-dark of using
“chemicalanddrugTuis” did not work. I changed that to “medicationTuis”, as suggested
by the code, which did indeed cause most of the expected additional terms to be annotated.

So that partially answers my question. The ones it still missed despite being tied to the
added TUIs, and the ones not added to the annotations despite adding T058 to the existing
element with group “procedureTuis”, remain mysteries…

----

From: Pei Chen [mailto:chenpei@apache.org]
Sent: Fri, 04 Apr, 2014 16:33
To: user@ctakes.apache.org<mailto:user@ctakes.apache.org>
Subject: [External] Re: Problems with TUI filtering and other annotation omissions

Richard,
org.apache.ctakes.assertion.medfacts.types.Concept is an internal type used by the assertion
module,
could you see what is returned in: org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation?


On Fri, Apr 4, 2014 at 3:56 PM, Lee, Richard A. [USA] <lee_richard@bah.com<mailto:lee_richard@bah.com>>
wrote:
I ran several documents through cTAKES, using AggregatePlaintextUMLSProcessor, and examined
the list of org.apache.ctakes.assertion.medfacts.types.Concept annotations produced for each.
From those results, I made up a list of phrases I had hoped cTAKES would annotate but did
not. I used MetaMap to look up each of those phrases, and found that approximately 150 of
them resulted in a full-phrase match and a corresponding CUI.

I used the MetamorphoSys scripts to load the UMLS RRF data set into a SQL DB, and queried
the DB to confirm that those ~150 phrases were indeed present with the expected CUIs. So,
the question becomes, why didn’t cTAKES annotate them?

Looking at the cTAKES logs, it appears the OrangeBookFilter “Filtered out” only 5 out
of the 150.

The other possible cause I could think of was the TUI filtering; there was no evidence of
it in the logs, but I don’t know whether the results of filtering in that step get logged
by default or not. I looked up in the DB the TUIs for each of the phrases, compared them to
the lists of “allowed” TUIs in LookupDesc_Db.xml, and concluded that the TUI filtering
might account for 44 of the phrases. So the rest remain a mystery.

I modified the TUI lists in LookupDesc_Db.xml to add TUIs, in the hopes that that would cause
the corresponding phrases to be annotated. Specifically, I added T058 to one list, and added
a second list with a handful of TUIs:

<property key="procedureTuis" value="T058,T059,T060,T061"/>
<property key="chemicalanddrugTuis" value="T109,T110,T116,T121,T123"/>

T058 corresponded to 3 of the phrases on my list; T121 alone accounted for 24 of them. But,
upon restarting cTAKES with that modified file, and running relevant documents, I found that
the expected phrases were still not annotated. I even tried making the same change in LookupDesc.xml
just in case, to no avail.

So, the questions are:

- Are there reasons beyond the OrangeBook and TUI filters why CUI-associated phrases in UMLS
would not get annotated?

- Do TUI-filter results get logged by default, and if not, is there a way (log4j settings?)
to log them without making code changes?

- Am I doing the TUI filter changes wrong?

Thanks for any answers and advice.

Mime
View raw message