ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, Richard A. [USA]" <lee_rich...@bah.com>
Subject Problems with TUI filtering and other annotation omissions
Date Fri, 04 Apr 2014 19:56:27 GMT
I ran several documents through cTAKES, using AggregatePlaintextUMLSProcessor, and examined
the list of org.apache.ctakes.assertion.medfacts.types.Concept annotations produced for each.
From those results, I made up a list of phrases I had hoped cTAKES would annotate but did
not. I used MetaMap to look up each of those phrases, and found that approximately 150 of
them resulted in a full-phrase match and a corresponding CUI.

I used the MetamorphoSys scripts to load the UMLS RRF data set into a SQL DB, and queried
the DB to confirm that those ~150 phrases were indeed present with the expected CUIs. So,
the question becomes, why didn’t cTAKES annotate them?

Looking at the cTAKES logs, it appears the OrangeBookFilter “Filtered out” only 5 out
of the 150.

The other possible cause I could think of was the TUI filtering; there was no evidence of
it in the logs, but I don’t know whether the results of filtering in that step get logged
by default or not. I looked up in the DB the TUIs for each of the phrases, compared them to
the lists of “allowed” TUIs in LookupDesc_Db.xml, and concluded that the TUI filtering
might account for 44 of the phrases. So the rest remain a mystery.

I modified the TUI lists in LookupDesc_Db.xml to add TUIs, in the hopes that that would cause
the corresponding phrases to be annotated. Specifically, I added T058 to one list, and added
a second list with a handful of TUIs:

<property key="procedureTuis" value="T058,T059,T060,T061"/>
<property key="chemicalanddrugTuis" value="T109,T110,T116,T121,T123"/>

T058 corresponded to 3 of the phrases on my list; T121 alone accounted for 24 of them. But,
upon restarting cTAKES with that modified file, and running relevant documents, I found that
the expected phrases were still not annotated. I even tried making the same change in LookupDesc.xml
just in case, to no avail.

So, the questions are:

- Are there reasons beyond the OrangeBook and TUI filters why CUI-associated phrases in UMLS
would not get annotated?

- Do TUI-filter results get logged by default, and if not, is there a way (log4j settings?)
to log them without making code changes?

- Am I doing the TUI filter changes wrong?

Thanks for any answers and advice.
View raw message