ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: ctakes concept and relation extraction
Date Tue, 02 Aug 2016 13:11:01 GMT
I don't know if there is a single pipeline that does concepts and
relations, if not you will have to use UIMAFit calls to add additional
extractors to the fast pipeline descriptor you are currently getting.

You may want "IdentifiedAnnotation" and its subclasses as your type
because it has a definite span. Each IA may correspond to a number of
different concepts in the UMLS dictionary, so we have a data structure
that contains all the matches for a given span. That is the FSArray (It
is a UIMA data type, stands for FeatureStructureArray). The UMLS
dictionary annotators will create UmlsConcept instances in that array
based on the results of the dictionary lookup.
Finding the "best" one for any span is not something that cTAKES will do
for you, it probably depends on your application. Sometimes we output
them all, sometimes we output the first one, you may need to dig in to
see how many of them are relevant and filter against a subset of things
you are looking for.


Looks like the word "kidney" is indeed in the input:

> human embryo kidney 293T cells

ctakes will find mentions even as modifiers inside larger phrases.


Finally, I would not try to interpret Uima xml manually, I would use the
UIMA CVD (visual debugger) to read the .xmi files that ctakes outputs.
(I believe they should be xmi).

Tim


On Tue, 2016-08-02 at 14:13 +0200, Niraj Shrestha wrote:
> Dear Sir
> I am trying to extract named entities and their relations from medical
> document. If I understood correctly concepts are basically entities. 
> I have used two different analysis engines:
>      AggregatePlaintextFastUMLSProcessor.xml for concept extraction
> and 
> 
>      RelationExtractorAggregate for relation extraction. 
> 
> 
> My first question is how can I combined both engine to obtain concept
> and relations in single file. 
> 
> 
> If I understood correctly, If I need to extract all the entities
> (concepts) then I need to get all the nodes
> "org.apache.ctakes.typesystem.type.refsem.UmlsConcept" from output xml
> file. But how can I choose the single entities or concept from list of
> many concepts? 
> 
> 
> and What is FSArray in which all concept ids are listed. 
> 
> 
> I found some concepts are not mentioned on input data but it appeared
> in the output data for example, when I use following engine in
> "note.txt" file
> 
> 
> <import
> location="../analysis_engine/AggregatePlaintextFastUMLSProcessor.xml"/>
> 
> output file is "note.txt4.xml" (attached here)
> 
> 
> One of the concept is following, where "kidney" is mentioned as
> preferredText but the word "kidney" is not found in the input data. 
> 
> 
> <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4503"
> codingScheme="SNOMEDCT" code="64033007" oid="64033007#SNOMEDCT"
> score="0.0" disambiguated="false" cui="C0022646" tui="T023"
> preferredText="Kidney"/>
>     <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4493"
> codingScheme="SNOMEDCT" code="17373004" oid="17373004#SNOMEDCT"
> score="0.0" disambiguated="false" cui="C0227665" tui="T023"
> preferredText="Both kidneys"/>
>     <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4483"
> codingScheme="SNOMEDCT" code="181414000" oid="181414000#SNOMEDCT"
> score="0.0" disambiguated="false" cui="C1278978" tui="T023"
> preferredText="Entire kidney"/>
>     <uima.cas.FSArray _id="4513" size="3">
>         <i>4483</i>
>         <i>4493</i>
>         <i>4503</i>
>     </uima.cas.FSArray>
> 
> 
> 
> 
> ************************************
> My next query concern with relation extraction for which I use
> following engine. 
> 
> 
> <import
> location="../../../ctakes-relation-extractor/desc/analysis_engine/RelationExtractorAggregate.xml"/>
> 
> output file is "note.txt_relation.xml" (attached here)
> 
> 
> I am not able to interpret the output file (note.txt_relation.xml) in
> which relation and their location is mentioned but could not figure
> out which entities and what relation between those entities in terms
> of words. 
> 
> 
> For eg:
> 
> 
> <org.apache.ctakes.typesystem.type.relation.RelationArgument
> _indexed="1" _id="12422" id="0" _ref_argument="10680"
> role="Argument"/>
>     <org.apache.ctakes.typesystem.type.relation.RelationArgument
> _indexed="1" _id="12427" id="0" _ref_argument="10989"
> role="Related_to"/>
>     <org.apache.ctakes.typesystem.type.relation.RelationArgument
> _indexed="1" _id="12446" id="0" _ref_argument="10680"
> role="Argument"/>
> .
> .
> .
> .
> <org.apache.ctakes.typesystem.type.relation.RelationArgument
> _indexed="1" _id="12851" id="0" _ref_argument="12181"
> role="Related_to"/>
>     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> _indexed="1" _id="12432" id="0" category="location_of"
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> conditional="false" _ref_arg1="12422" _ref_arg2="12427"/>
>     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> _indexed="1" _id="12456" id="0" category="location_of"
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> conditional="false" _ref_arg1="12446" _ref_arg2="12451"/>
>     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> _indexed="1" _id="12480" id="0" category="location_of"
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> conditional="false" _ref_arg1="12470" _ref_arg2="12475"/>
>     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> _indexed="1" _id="12508" id="0" category="location_of"
> discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> conditional="false" _ref_arg1="12498" _ref_arg2="12503"/>
> 
> 
> 
> 
> Sorry for long and many queries at once. 
> 
> 
> Thanks a lot in advance for your suggetions.
> 
> 
> With regards,
> Shrestha
> 
> 
> 
> 


Mime
View raw message