From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: How to augment/modify UMLS resources?
Date Mon, 06 Jan 2014 21:49:37 GMT
Hi Lee,

As you have discovered, the dictionary1.csv is not used by AggregatePlainTextProcessor.xml

AggregatePlainTextProcessor.xml uses a (tiny) lucene index for the few words like knee and
pain that are annotated without using the larger UMLS resource.

I think to use the csv instead of the other methods, you would modified AggregatePlainTextProcessor.xml
to refer to 
instead of 
where there currently is the line
<import location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotator.xml"/>

I can't remember offhand if there are parameters you have to update too, but I don't think

Hopefully that will give you an idea then of what to add to AggregatePlainTextUMLSProcessor.xml
to get both the UMLS and your csv dictionary in effect.

(You would add the following to the delegateAnalysisEngine list of AggregatePlainTextUMLSProcessor.xml

<delegateAnalysisEngine key="DictionaryLookupAnnotator">
<import location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorCSV.xml"/>

And then add 
Just before or just after

-- James

-----Original Message-----
From: user-return-457-Masanz.James=mayo.edu@ctakes.apache.org [mailto:user-return-457-Masanz.James=mayo.edu@ctakes.apache.org]
On Behalf Of Lee, Richard A. [USA]
Sent: Monday, January 06, 2014 3:09 PM
To: user@ctakes.apache.org
Subject: RE: How to augment/modify UMLS resources?

Thanks, James.

I am leaning toward supplementing the UMLS DB as you suggest rather than changing it, if I
can make that work. I did originally try adding entries to dictionary1.csv, while using AggregatePlainTextProcessor.xml,
but saw no change in the annotations. I guess that dictionary1 is in fact not being used in
APTP.xml, and "hyperlipidemia", "knee", "pain", et al get annotated due to some other term
list / dictionary. Time to wade through the contents of that ctakes-dictionary-lookup-res\src\...

-----Original Message-----
From: Masanz, James J. [mailto:Masanz.James@mayo.edu] 
Sent: Fri, 03 Jan, 2014 16:45
To: 'user@ctakes.apache.org'
Subject: [External] RE: How to augment/modify UMLS resources?

The separately downloadable UMLS dictionary formatted for cTAKES [1], not counting medication
names (RxNorm), is in a database [2]. So you could add to that database whatever terms you

The RxNorm dictionary is in a Lucene index (though there is a related jira ticket open so
that maybe it will end up in the same database) so to add to the currently used medications
list, would probably best be done programmatically using the Lucene API (someone with more
Lucene end-user experience, please chime in)

cTAKES provides a way to look up terms in a flatfile dictionary that you would provide. See
the files that end with .csv within ctakes-dictionary-lookup-res\src\main\resources\org\apache\ctakes\dictionary\lookup

The flatfile is not used directly in conjunction with the database file of terms from UMLS
– to use the two together, you would have one annotator configured to use that flatfile
for the dictionary, and have a second annotator configured to use the database file.
Some things to be aware of if you went that route
 - each note would be processed by both, and if you had terms in your flatfile that duplicated
what was in the database, you would end up with double annotations
 - each note would be processed in effect twice (not by the entire pipeline thankfully) so
it would be a slower than just using one.

As far as something being annotated that you don't want annotated, within the LookupDesc*xml
file being used, there can be an excludeList to have "men" no longer annotated.  See LookupDesc_DrugNER.xml
for an example of using excludeList.

Any improvements or even written steps on any of the above would be a great contribution.

-- James

[1] http://sourceforge.net/projects/ctakesresources/files/
[2] the relative path to the hsql db is resources\org\apache\ctakes\dictionary\lookup\umls2011ab

From: user-return-451-Masanz.James=mayo.edu@ctakes.apache.org [mailto:user-return-451-Masanz.James=mayo.edu@ctakes.apache.org]
On Behalf Of Lee, Richard A. [USA]
Sent: Thursday, January 02, 2014 5:01 PM
To: user@ctakes.apache.org
Subject: How to augment/modify UMLS resources?

Howdy, all. I’ve got a lot of experience with various commercial extraction tools, but I’m
new to cTAKES and UIMA, so please bear with me.

I am able to use my UMLS credentials to process documents, and the results are good. But there
are a few things I wish to change in the medfacts.types.Concept and AnatomicalSiteMention
areas, for starters. For example, while it annotates “orbicularis oculi” as a concept,
it does not annotate “musculus orbicularis oculi”, “septum orbital”, or “oculi medialis”.
It annotates “ulceration”, “perforation”, and “corneal perforation” but not “corneal
ulceration”. It annotates “men” (as in “Chinese men”) as a “problem”. It annotates
“ER” (ie Emergency Room) as an AnatomicalSiteReference.

So, the question becomes, how do I address these? Do I need to somehow re-generate (with changes)
the UMLS data files, probably using Luke or some such? That seems a bit crude. Is there a
clean way to supplement those data files instead to achieve the desired changes?

Thanks in advance.

Richard A Lee || Lead Associate / Senior Ontologist || lee_richard@bah.com || 571-482-7809

