ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From britt fitch <britt.fi...@wiredinformatics.com>
Subject Re: How to train new models for the ClearTK based assertion analyzers?
Date Fri, 09 Jan 2015 18:13:32 GMT
Hi Lance,
There was a thread on the dev list on the topic to manually create and release gold standard
annotations in the Anafora format.
This is now possible thanks to John Green's manually generated example notes.


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110

> On Jan 9, 2015, at 12:35 PM, Lance Eason <lance@iodinesoftware.com> wrote:
> The new ClearTK assertion analyzers in 3.2.1 (GenericCleartkAnalysisEngine, HistoryCleartkAnalysisEngine,
etc.) are a welcome change from the perspective that they're much, much faster than the previous
MedFacts implementation.  Unfortunately though I'm finding them significantly less accurate
at actually flagging the assertion attributes correctly.
> Could someone point me in the direction of how to train new models?  I've found org.apache.ctakes.assertion.train.TrainAttributeModels
which looks promising but I can't find the current training data anywhere (I'd like to use
it as the starting point) and without samples have no idea what the format it's expecting
> Some real world examples from clinical notes:
> I would have expected each of the following to be 'generic' (or maybe 'conditional')
as they're referring to a hypothetical future problem.  (To be fair the previous implementation
did no better on these):
>    - Gel foam cushion is also required in order to prevent pressure ulcers from forming,
as patient will spend many hours of the day in chair.
>    - Ordered wound care protocol and applied mepliex to bottom to prevent skin breakdown.
>    - Encouraged patient to shift weight more frequently to prevent any pressure ulcers.
>    - Showed patient how to use foam to avoid pressure ulcers.
>    - Gave her pamphlet about pressure ulcers.
>    - I educated patient on causes and prevention of pressure ulcers.
>    - Instructed on need to adjust position Q2 hours to avoid pressure ulcers.
> This statement is about as direct an instance of negation as possible and is instead
ruled as conditional by the new models instead of negated:
>    - Patient does not have pneumonia.
> "Ruled out" is no longer understood as negation:
>    - Ruled out pneumonia.
> "h/o" is no longer understood as historic ("hx" and "history of" are still picked up):
>    - h/o heart failure.
> Trailing conditionals are no longer picked up:
>    - "Likely tuberculosis" (found) vs. "Tuberculosis likely" (missed)
>    - "Possible tuberculosis" (found" vs. "Tuberculosis is possible" (missed)

View raw message