ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: Analyzing and processing cTakes NLP output
Date Thu, 10 Oct 2013 00:18:28 GMT
Yeah, a CPE is one way to go for reading a set of documents and then outputting specific information.
If you go that route, given your desired outcomes, you would have to write a UIMA Consumer
class to extract all the things you specified and put them somewhere.

Alternatively, many of our projects are moving towards using UIMAFit, which allows you to
do many of the same things without having to deal with xml configuration files. A good place
to start with that approach is the class:

and its parent class:

It has a main method so you can run it like a normal java program. It will run the standard
ctakes pipeline on a set of files in a hardcoded directory ("data/input") and write out files
with the extracted CUIs to another hardcoded directory ("data/output"). That isn't exactly
what you want but I think if you do need to do some development you can copy that class and
extend it for your own uses, and that is probably the route that requires the smallest amount
of effort.


On 10/09/2013 06:30 PM, digital girl wrote:
Hi Tim,

Thanks for the prompt response.

For starters, what I'd like to do process several hundred clinical narratives and extract
the key items per narrative (CUI, RxForm, symptoms, relationships, smoking status,  etc) for
structured classification in a database.   Since I'm looking at a collection of narratives
for processing I see that the CPE tool would be ideal.

You stated that "for more systematic access and processing we usually will write java code
around an annotator that will use the ctakes API and typesystem to extract what we need."
 I'm currently using the user tool I'm guessing that I will need to graduate to the developer
version in order to do what you stated.

I appreciate your feedback.



Hi Paula,
The typical way we visually inspect this information is in the CVD tool. Then for more systematic
access and processing we usually will write java code around an annotator that will use the
ctakes API and typesystem to extract what we need. Looking directly at the xml is usually
about as useful as you seem to have found it (i.e., not very :) ).

What task are you trying to accomplish? If you just want to see what concepts are found for
one file at a time that can be done in the CVD. If you are having trouble finding what you
need there let us know. If you want an output file with all terms that were listed in any
given input file that would probably require a little bit of programming.


From: cybersation@hotmail.com<mailto:cybersation@hotmail.com>
To: user@ctakes.apache.org<mailto:user@ctakes.apache.org>
Subject: Analyzing and processing cTakes NLP output
Date: Tue, 8 Oct 2013 21:00:17 -0400

Hi Team (or is it just the Team of Samir ;-)

I had processed a 2 1/2 page narrative from the CVD tool and exported to XCAS file in xml.
  I would like to extract the key items from the narrative that cTakes is known for such as
identifying diseases/disorders, medications, signs/symptoms, and so forth.    I quickly perused
the file via xml browser and did see the SNOMED and RXNORM codes associated.   I decided to
printout out the file to markup the sections and to get an idea of how these codes relate
back to the concepts identified by cTakes.   My printer ran out of paper after about 60 pages
and when I looked at the top sheet it was 1 out of 2243 pages!   A 2 1/2 page narrative resulted
in an xml file of over two thousand pages!!!

 I examined the first medication mapping.  The numeric lines are my comments and everything
else copy/pasted from XCAS file.

1.  Identification of RxNorm code is 69749 but it's not associated with a concept so I copied
'163573' and pasted in search in the xml file.   See number 2 below for what retrieved.
<uima.cas.FSArray _id="163573" size="1">
<org.apache.ctakes.typesystem.type.refsem.OntologyConcept _id="163539" codingScheme="RXNORM"
code="69749" oid="69749#RXNORM"/>

2.  Retrieved this result with some additional information such as the generic is false, 
but not med name mention.  I  copied "163581" and pasted in search.  See number 3 below for
what retrieved.
_indexed="1" _id="163581" _ref_sofa="6" begin="9776" end="9784" id="530" _ref_ontologyConceptArr="163573"
typeID="1" segmentID="SIMPLE_SEGMENT" discoveryTechnique="1" confidence="1.0" polarity="1"
uncertainty="0" conditional="false" generic="false" subject="patient" historyOf="0"/>

3.  Retrieved this result.  The RxNorm code associated identifies Coumadin as a treatment.
<org.apache.ctakes.assertion.medfacts.types.Concept _indexed="1" _id="227307" _ref_sofa="6"
begin="9776" end="9784" conceptType="TREATMENT" conceptText="Coumadin" externalId="0" originalEntityExternalId="163581"/>

Here are my questions:

1. is there any resources available that explains what the xml output file contains and the
layout?  Such as what does confidence of 1.0 and polarity of 1 and uncertainty of 0 refer

2.  Are there any tools already existing that interpret the NLP output from cTakes and automatically
structure and associate it to the concepts?  Such as, automatically associate the RxNorm to
the medication mention as illustrated above.   As you see it took a few steps to associate
the RxNorm code to the actual medication mention from the narrative.



View raw message