ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Eason <la...@iodinesoftware.com>
Subject How to train new models for the ClearTK based assertion analyzers?
Date Fri, 09 Jan 2015 17:35:29 GMT
The new ClearTK assertion analyzers in 3.2.1 (GenericCleartkAnalysisEngine,
HistoryCleartkAnalysisEngine, etc.) are a welcome change from the
perspective that they're much, much faster than the previous MedFacts
implementation.  Unfortunately though I'm finding them significantly less
accurate at actually flagging the assertion attributes correctly.

Could someone point me in the direction of how to train new models?  I've
found org.apache.ctakes.assertion.train.TrainAttributeModels which looks
promising but I can't find the current training data anywhere (I'd like to
use it as the starting point) and without samples have no idea what the
format it's expecting is.

Some real world examples from clinical notes:

I would have expected each of the following to be 'generic' (or maybe
'conditional') as they're referring to a hypothetical future problem.  (To
be fair the previous implementation did no better on these):
   - Gel foam cushion is also required in order to prevent pressure ulcers
from forming, as patient will spend many hours of the day in chair.
   - Ordered wound care protocol and applied mepliex to bottom to prevent
skin breakdown.
   - Encouraged patient to shift weight more frequently to prevent any
pressure ulcers.
   - Showed patient how to use foam to avoid pressure ulcers.
   - Gave her pamphlet about pressure ulcers.
   - I educated patient on causes and prevention of pressure ulcers.
   - Instructed on need to adjust position Q2 hours to avoid pressure

This statement is about as direct an instance of negation as possible and
is instead ruled as conditional by the new models instead of negated:
   - Patient does not have pneumonia.

"Ruled out" is no longer understood as negation:
   - Ruled out pneumonia.

"h/o" is no longer understood as historic ("hx" and "history of" are still
picked up):
   - h/o heart failure.

Trailing conditionals are no longer picked up:
   - "Likely tuberculosis" (found) vs. "Tuberculosis likely" (missed)
   - "Possible tuberculosis" (found" vs. "Tuberculosis is possible" (missed)

View raw message