ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: cTAKES false positives, case-insensitivity
Date Wed, 01 Jun 2016 15:49:17 GMT
Hi Tomasz,

Ctakes lookup (both original and fast-) is case insensitive by design.  There have been brief
discussions on changing this behavior, but things like capitalized form entries, list headings,
and plain old first word capitalization have prevented it from being implemented. 

One big interest in the community is word sense disambiguation, which would allow the culling
of terms based upon the likelihood that they do not properly fit in context.

Culling could also be done based upon normal frequency of the term appearing in text.  Or
you could create an annotation engine that culls based upon some other requirement, such as
semantic type.

For your two specific examples you can prevent a lot of false positive acronyms and abbreviations
by increasing the required character count cutoff for terms.  This can be done by setting
the uima parameter "minimumSpan" to 5 (getting rid of "AIDS" but keeping "APSGN").  You can
do this using the old xml style or uimafit, something like 

AnalysisEngineFactory.createEngineDescription( DefaultJCasTermAnnotator.class, JCasTermAnnotator.PARAM_MIN_SPAN_KEY,
3 )

Sean


-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu] 
Sent: Wednesday, June 01, 2016 11:28 AM
To: dev@ctakes.apache.org
Subject: cTAKES false positives, case-insensitivity

Hi,

I have encountered false positives annotated with cTAKES that seem to come from case-insensitivity
of the annotation lookup, such as:

Pt uses hearing aids. -> "aids" is found as DiseaseDisorderMention cui=C0001175, Acquired
Immunodeficiency Syndrome

Pt values are all stable. -> "all" is found as DiseaseDisorderMention cui=C1961102, Precursor
Cell Lymphoblastic Leukemia Lymphoma"

Are there ways in cTAKES to approach or to resolve such issues?

How do you deal with such false positives, so that they are not matched?

Regards,
Tomasz

Mime
View raw message