ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niraj Shrestha <nshrest...@gmail.com>
Subject Re: ctakes concept and relation extraction
Date Tue, 02 Aug 2016 14:07:05 GMT
Hi Timothy
Thanks for the prompt reply.
Is it possible to use IdentifiedAnnotation in CPE?
I saw IdentifiedAnnotation in CVD which select one concept among the
collections.
I would like to run CPE since I need to run for many documents. I believe
that I could not run CVD for many documents, am I right?

Regards,
Shrestha

On Tue, Aug 2, 2016 at 3:11 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> I don't know if there is a single pipeline that does concepts and
> relations, if not you will have to use UIMAFit calls to add additional
> extractors to the fast pipeline descriptor you are currently getting.
>
> You may want "IdentifiedAnnotation" and its subclasses as your type
> because it has a definite span. Each IA may correspond to a number of
> different concepts in the UMLS dictionary, so we have a data structure
> that contains all the matches for a given span. That is the FSArray (It
> is a UIMA data type, stands for FeatureStructureArray). The UMLS
> dictionary annotators will create UmlsConcept instances in that array
> based on the results of the dictionary lookup.
> Finding the "best" one for any span is not something that cTAKES will do
> for you, it probably depends on your application. Sometimes we output
> them all, sometimes we output the first one, you may need to dig in to
> see how many of them are relevant and filter against a subset of things
> you are looking for.
>
>
> Looks like the word "kidney" is indeed in the input:
>
> > human embryo kidney 293T cells
>
> ctakes will find mentions even as modifiers inside larger phrases.
>
>
> Finally, I would not try to interpret Uima xml manually, I would use the
> UIMA CVD (visual debugger) to read the .xmi files that ctakes outputs.
> (I believe they should be xmi).
>
> Tim
>
>
> On Tue, 2016-08-02 at 14:13 +0200, Niraj Shrestha wrote:
> > Dear Sir
> > I am trying to extract named entities and their relations from medical
> > document. If I understood correctly concepts are basically entities.
> > I have used two different analysis engines:
> >      AggregatePlaintextFastUMLSProcessor.xml for concept extraction
> > and
> >
> >      RelationExtractorAggregate for relation extraction.
> >
> >
> > My first question is how can I combined both engine to obtain concept
> > and relations in single file.
> >
> >
> > If I understood correctly, If I need to extract all the entities
> > (concepts) then I need to get all the nodes
> > "org.apache.ctakes.typesystem.type.refsem.UmlsConcept" from output xml
> > file. But how can I choose the single entities or concept from list of
> > many concepts?
> >
> >
> > and What is FSArray in which all concept ids are listed.
> >
> >
> > I found some concepts are not mentioned on input data but it appeared
> > in the output data for example, when I use following engine in
> > "note.txt" file
> >
> >
> > <import
> > location="../analysis_engine/AggregatePlaintextFastUMLSProcessor.xml"/>
> >
> > output file is "note.txt4.xml" (attached here)
> >
> >
> > One of the concept is following, where "kidney" is mentioned as
> > preferredText but the word "kidney" is not found in the input data.
> >
> >
> > <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4503"
> > codingScheme="SNOMEDCT" code="64033007" oid="64033007#SNOMEDCT"
> > score="0.0" disambiguated="false" cui="C0022646" tui="T023"
> > preferredText="Kidney"/>
> >     <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4493"
> > codingScheme="SNOMEDCT" code="17373004" oid="17373004#SNOMEDCT"
> > score="0.0" disambiguated="false" cui="C0227665" tui="T023"
> > preferredText="Both kidneys"/>
> >     <org.apache.ctakes.typesystem.type.refsem.UmlsConcept _id="4483"
> > codingScheme="SNOMEDCT" code="181414000" oid="181414000#SNOMEDCT"
> > score="0.0" disambiguated="false" cui="C1278978" tui="T023"
> > preferredText="Entire kidney"/>
> >     <uima.cas.FSArray _id="4513" size="3">
> >         <i>4483</i>
> >         <i>4493</i>
> >         <i>4503</i>
> >     </uima.cas.FSArray>
> >
> >
> >
> >
> > ************************************
> > My next query concern with relation extraction for which I use
> > following engine.
> >
> >
> > <import
> >
> location="../../../ctakes-relation-extractor/desc/analysis_engine/RelationExtractorAggregate.xml"/>
> >
> > output file is "note.txt_relation.xml" (attached here)
> >
> >
> > I am not able to interpret the output file (note.txt_relation.xml) in
> > which relation and their location is mentioned but could not figure
> > out which entities and what relation between those entities in terms
> > of words.
> >
> >
> > For eg:
> >
> >
> > <org.apache.ctakes.typesystem.type.relation.RelationArgument
> > _indexed="1" _id="12422" id="0" _ref_argument="10680"
> > role="Argument"/>
> >     <org.apache.ctakes.typesystem.type.relation.RelationArgument
> > _indexed="1" _id="12427" id="0" _ref_argument="10989"
> > role="Related_to"/>
> >     <org.apache.ctakes.typesystem.type.relation.RelationArgument
> > _indexed="1" _id="12446" id="0" _ref_argument="10680"
> > role="Argument"/>
> > .
> > .
> > .
> > .
> > <org.apache.ctakes.typesystem.type.relation.RelationArgument
> > _indexed="1" _id="12851" id="0" _ref_argument="12181"
> > role="Related_to"/>
> >     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> > _indexed="1" _id="12432" id="0" category="location_of"
> > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> > conditional="false" _ref_arg1="12422" _ref_arg2="12427"/>
> >     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> > _indexed="1" _id="12456" id="0" category="location_of"
> > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> > conditional="false" _ref_arg1="12446" _ref_arg2="12451"/>
> >     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> > _indexed="1" _id="12480" id="0" category="location_of"
> > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> > conditional="false" _ref_arg1="12470" _ref_arg2="12475"/>
> >     <org.apache.ctakes.typesystem.type.relation.LocationOfTextRelation
> > _indexed="1" _id="12508" id="0" category="location_of"
> > discoveryTechnique="0" confidence="0.0" polarity="0" uncertainty="0"
> > conditional="false" _ref_arg1="12498" _ref_arg2="12503"/>
> >
> >
> >
> >
> > Sorry for long and many queries at once.
> >
> >
> > Thanks a lot in advance for your suggetions.
> >
> >
> > With regards,
> > Shrestha
> >
> >
> >
> >
>
>

Mime
View raw message