ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: Best combination of analysis engines to consider negation, family history, uncertainty, etc.
Date Wed, 19 Oct 2016 15:11:40 GMT
I can second Sean's thank you, it is good to have this feedback. The ClearTK machine learning
models were made the default after we ran some experiments that found it performed better
across a range of standard datasets than rule-based algorithms or the existing cTAKES module
(http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774). Since making them
the default, though, we have heard from people and had our own experience conflict with those
experiments. And certainly the errors in the rule-based system are easier to understand.

Just curious, are you able to characterize the errors you see from the ClearTK system? I did
some experiments recently on a new dataset comparing negex with the cleartk negation module
and found that there was a precision/recall tradeoff but almost identical F1 scores. But for
that dataset the tradeoff negex provided was preferred by our collaborators. (I think negex
had better recall of negated terms but worse precision).


From: Finan, Sean <Sean.Finan@childrens.harvard.edu>
Sent: Wednesday, October 19, 2016 10:53 AM
To: dev@ctakes.apache.org
Subject: RE: Best combination of analysis engines to consider negation, family history, uncertainty,

Hi Yiming,

Thank you very much for letting the community know what has and has not worked for you.  I
have also had better results with the Assertion annotators than the ClearTk alternatives,
but that could be because of the note types/formats that I am using.

Regarding the "Clear" in names, it is because ClearTk (Clear ToolKit) is used to train machine
learning models for detection of the indicated property.  You can find information on ClearTk
starting here:  https://urldefense.proofpoint.com/v2/url?u=http-3A__clear.colorado.edu_compsem_&d=DQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=aRk0CH-2UrNpH0F4PgdnzixY-xVsh8OYTCP8mhe27Gw&s=0mEmiKK5adFN2YCkYyNCNM3Cv4FNWlMbN8XU6GtcQP4&e=

If you prefer to read a paper, you can check out https://urldefense.proofpoint.com/v2/url?u=http-3A__www.lrec-2Dconf.org_proceedings_lrec2014_pdf_218-5FPaper.pdf&d=DQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=aRk0CH-2UrNpH0F4PgdnzixY-xVsh8OYTCP8mhe27Gw&s=T-pZCKB6BckhHzvYc9gyutCmKQlhitdO_-i4e387tjM&e=

Others no the devlist can provide much more information than can I, so you could post a question
if you like.



-----Original Message-----

From: Zuo Yiming [mailto:yimingzuo@gmail.com]

Sent: Wednesday, October 19, 2016 10:04 AM

To: user@ctakes.apache.org; dev@ctakes.apache.org

Subject: Best combination of analysis engines to consider negation, family history, uncertainty,

Hi everyone,

I've spent the last a few months working on a clinical NLP project using cTAKES. It's a very
complex system to me and every time I dig into it some new discoveries will come out. Since
last week, I tried to figure out which analysis engine can help to do a good job to consider
cases like negation, family history, uncertainty, etc. By now, I had some experience and would
like to share with the community.

The best combination for me is to use assertionMiniPipelineAnalysisEngine

for negation, uncertainty, generic and subject detection, and HistoryCleartkAnalysisEngine
for history detection. Both engines are in desc/ctakes-assertion folder. The assertionMiniPipelineAnalysisEngine
also claims to be useful for conditional detection, which I haven't verified using my test
files yet.

I'm using the AggregatePlaintextFastUMLSProcessor on the higher level. The default analysis
engines in AggregatePlaintextFastUMLSProcessor for negation, uncertainty, generic, etc. are
StatusAnnotator + NegationAnnotator + PolarityCleartkAnalysisEngine + SubjectCleartkAnalysisEngine
+ UncertaintyCleartkAnalysisEngine + GenericCleartkAnalysisEngine + HistoryCleartkAnalysisEngine.
It looks like in the node part, StatusAnnotator and NegationAnnotator are commented out, so
only the remaining five analysis engines are actually used and all of them are in the same
desc/ctakes-assertion folder. These five analysis engines were not effective in my test files
and I'm still confused by their relationship to the assertionaAnalysisEngine, conceptConverterAnalysisEngine,
GenericAttributeAnalysisEngine and SubjectAttributeAnalysisEngine used in assertionMiniPipelineAnalysisEngine.

It looks to me the Clear in their names indicate something but I couldn't figure it out without
going through the java code, which I intend not to do at this level.

That's pretty much all of it for now. Anyone familiar with this topic are welcome to jump
in to provide my insights or correction. Hopefully, we can have a nice discussion that can
be useful to other users and developers.

ps. The reason for using AggregatePlaintextFastUMLSProcessor rather than AggregatePlaintextProcessor
is that I find the preferred words property in the former very useful while it can't be detected
using the latter.




Yiming Zuo <https://urldefense.proofpoint.com/v2/url?u=https-3A__sites.google.com_site_yimingzuo_&d=DQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=4at7fOO27JCueBfJFn7Hv2vKWlUAK-nuYYdmMyGRJPQ&s=vSmSOvLXuCa-Pwp8qu05VTzZgGA0P3Y2CL8q3JBhppQ&e=>
Georgetown U. Medical Center:

Dr. Ressom's Omics Lab <https://urldefense.proofpoint.com/v2/url?u=http-3A__omics.georgetown.edu_&d=DQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=4at7fOO27JCueBfJFn7Hv2vKWlUAK-nuYYdmMyGRJPQ&s=yNsVaS7s20e-125SmdmQqKHvQ0lAQ7si98GefPRDxT0&e=>
ECE Department of Virginia Tech:

Computational Bioinformatics & Bio-imaging Laboratory <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cbil.ece.vt.edu_&d=DQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=4at7fOO27JCueBfJFn7Hv2vKWlUAK-nuYYdmMyGRJPQ&s=DpORI1TH9yITkdlRX_RLjxejH2jMJUq8yFaTPjWAar4&e=>

View raw message