ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: Problems with TUI filtering and other annotation omissions
Date Fri, 04 Apr 2014 20:33:17 GMT
Richard,
org.apache.ctakes.assertion.medfacts.types.Concept is an internal type used
by the assertion module,
could you see what is returned in:
*org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation?*



On Fri, Apr 4, 2014 at 3:56 PM, Lee, Richard A. [USA]
<lee_richard@bah.com>wrote:

>   I ran several documents through cTAKES, using
> AggregatePlaintextUMLSProcessor, and examined the list of
> org.apache.ctakes.assertion.medfacts.types.Concept annotations produced for
> each. From those results, I made up a list of phrases I had hoped cTAKES
> would annotate but did not. I used MetaMap to look up each of those
> phrases, and found that approximately 150 of them resulted in a full-phrase
> match and a corresponding CUI.
>
>
>
> I used the MetamorphoSys scripts to load the UMLS RRF data set into a SQL
> DB, and queried the DB to confirm that those ~150 phrases were indeed
> present with the expected CUIs. So, the question becomes, why didn't cTAKES
> annotate them?
>
>
>
> Looking at the cTAKES logs, it appears the OrangeBookFilter "Filtered out"
> only 5 out of the 150.
>
>
>
> The other possible cause I could think of was the TUI filtering; there was
> no evidence of it in the logs, but I don't know whether the results of
> filtering in that step get logged by default or not. I looked up in the DB
> the TUIs for each of the phrases, compared them to the lists of "allowed"
> TUIs in LookupDesc_Db.xml, and concluded that the TUI filtering might
> account for 44 of the phrases. So the rest remain a mystery.
>
>
>
> I modified the TUI lists in LookupDesc_Db.xml to add TUIs, in the hopes
> that that would cause the corresponding phrases to be annotated.
> Specifically, I added T058 to one list, and added a second list with a
> handful of TUIs:
>
>
>
> <property key="procedureTuis" value="T058,T059,T060,T061"/>
>
> <property key="chemicalanddrugTuis" value="T109,T110,T116,T121,T123"/>
>
>
>
> T058 corresponded to 3 of the phrases on my list; T121 alone accounted for
> 24 of them. But, upon restarting cTAKES with that modified file, and
> running relevant documents, I found that the expected phrases were still
> not annotated. I even tried making the same change in LookupDesc.xml just
> in case, to no avail.
>
>
>
> So, the questions are:
>
>
>
> - Are there reasons beyond the OrangeBook and TUI filters why
> CUI-associated phrases in UMLS would not get annotated?
>
>
>
> - Do TUI-filter results get logged by default, and if not, is there a way
> (log4j settings?) to log them without making code changes?
>
>
>
> - Am I doing the TUI filter changes wrong?
>
>
>
> Thanks for any answers and advice.
>

Mime
View raw message