ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Devel <deve...@gmail.com>
Subject Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml
Date Tue, 21 Apr 2015 17:04:21 GMT
After further testing, removing the <node>NegationAnnotator</node> step in

ProductionPostSentenceAggregate_step2_libsvm.xml (which I assume is the sub
smoking desc xml you mean), the smoking status is not correctly classified
anymore when negations are there, so this step does not look redundant to
me.


For example, "He denied use of tobacco" is then classified as
CURRENT_SMOKER. If I leave this negation step in, it is correctly found as
NON_SMOKER.


I tried changing the order in which the smoking status nodes
<node>SentenceAdjuster</node> and <node>ClassifiableEntriesAnnotator</node>
are run in the clinical pipeline, putting them directly after lvg or at the
end of the flow does not change the observation above.


However, you said that leaving the NegationAnnotator in could overwrite
assertion values, how can this be prevented while keeping correct smoking
status classifications?

On Mon, Apr 20, 2015 at 2:02 PM, Chen, Pei <Pei.Chen@childrens.harvard.edu>
wrote:

> Great. There is a redundant Negation step in one of final sub smoking desc
> xml's.
> Leave the Jira as a placeholder to clean up the smoking status desc's.
>
> Sent from my iPhone
>
> > On Apr 20, 2015, at 1:11 PM, Tom Devel <develxy@gmail.com> wrote:
> >
> > Pei,
> >
> > I did what you recommended, I run a test input with this new pipeline and
> > did a diff with the clinical pipeline without the smoking status on the
> two
> > CAS files. It seems to do the trick, the Umls concept tags are still the
> > same, and there is now a new tag for the smoking status annotation,
> great!
> >
> > Before I create the Jira item, what do you mean with removing the last
> > NegEx?
> >
> > In AggregatePlaintextFastUMLSProcessor, the node of the NegationAnnotator
> > is commented out:
> > <!-- <node>NegationAnnotator</node> -->
> >
> > Did you mean this node?
> >
> > At the top of the file, there is an import for the NegationAnnotator:
> > <delegateAnalysisEngine key="NegationAnnotator">, but it is not commented
> > out and never run in the fixed flow.
> >
> > Am I correct that the negation detection in the clinical pipeline is now
> > performed by PolarityCleartkAnalysisEngine?
> >
> > Thanks,
> > Tom
> >
> >> On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen <chenpei@apache.org> wrote:
> >>
> >> Tom,
> >> I would put it at the end of the pipeline (at a min, it should be behind
> >> sectionizer, sentence, tokenizer, lvg).  I would remove
> >> ExternalBaseAggregateTAE
> >> as this simulates the sectionizer, sentence, tokenizer, lvg would would
> be
> >> redundant.  I would also probably remove the last NegEx which could
> >> override the assertion values.
> >>
> >> Disclaimer: I did not test this yet.  Feel free to open a Jira item if
> it
> >> works for you so it can be tracked.  It seems kind of strange to have a
> >> descriptor xml define another xml descriptor to be loaded up via code
> >> again- I think this could be simplified.
> >> --Pei
> >>
> >>> On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <develxy@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it
> >> works
> >>> fine, I can see the smoking status annotation in the CVD.
> >>>
> >>> Now I would like to include the smoking status detection in the
> clinical
> >>> pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run
> >> the
> >>> clinincal pipeline, the smoking status will also be determined.
> >>>
> >>> How can I do this?
> >>>
> >>> I am thinking to just put the nodes from the fixed flow of
> >>> SimulatedProdSmokingTAE.xml into the fixed flow of
> >>> AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
> >>>
> >>> If so, at which exact place in the clinical pipeline fixed flow should
> >>> these nodes be added?
> >>>
> >>> Is there a preferred place (such as append after the last node or put
> >>> before the first node) ?
> >>>
> >>> Can a wrong position or ordering of the smoking status nodes
> >> damage/corrupt
> >>> the rest of the annotations?
> >>>
> >>> SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
> >>>
> >>> <fixedFlow>
> >>> <node>ExternalBaseAggregateTAE</node>
> >>> <node>SentenceAdjuster</node>
> >>> <node>ClassifiableEntriesAnnotator</node>
> >>> </fixedFlow>
> >>>
> >>> AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
> >>> fixed flow:
> >>>
> >>> <fixedFlow>
> >>> <node>SimpleSegmentAnnotator</node>
> >>> <node>SentenceDetectorAnnotator</node>
> >>> <node>TokenizerAnnotator</node>
> >>> <node>LvgAnnotator</node>
> >>> <node>ContextDependentTokenizerAnnotator</node>
> >>> <node>POSTagger</node>
> >>> <!-- <node>ClearPOSTagger</node> -->
> >>> <node>Chunker</node>
> >>> <node>AdjustNounPhraseToIncludeFollowingNP</node>
> >>> <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
> >>> <!--<node>LookupWindowAnnotator</node>-->
> >>> <node>DictionaryLookupAnnotatorDB</node>
> >>> <node>DrugNER</node>
> >>> <node>DependencyParser</node>
> >>> <node>SemanticRoleLabeler</node>
> >>> <node>ConstituencyParser</node>
> >>> <!-- <node>AssertionAnnotator</node> -->
> >>> <!-- <node>StatusAnnotator</node> -->
> >>> <!-- <node>NegationAnnotator</node> -->
> >>> <node>GenericCleartkAnalysisEngine</node>
> >>> <node>HistoryCleartkAnalysisEngine</node>
> >>> <node>PolarityCleartkAnalysisEngine</node>
> >>> <node>SubjectCleartkAnalysisEngine</node>
> >>> <node>UncertaintyCleartkAnalysisEngine</node>
> >>>
> >>> <node>ExtractionPrepAnnotator</node>
> >>> </fixedFlow>
> >>>
> >>> Thanks for any help or pointers,
> >>>
> >>> Tom
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message