Return-Path: X-Original-To: apmail-ctakes-user-archive@www.apache.org Delivered-To: apmail-ctakes-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C21D41050B for ; Fri, 9 Jan 2015 17:37:21 +0000 (UTC) Received: (qmail 50153 invoked by uid 500); 9 Jan 2015 17:37:23 -0000 Delivered-To: apmail-ctakes-user-archive@ctakes.apache.org Received: (qmail 50122 invoked by uid 500); 9 Jan 2015 17:37:23 -0000 Mailing-List: contact user-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ctakes.apache.org Delivered-To: mailing list user@ctakes.apache.org Received: (qmail 50109 invoked by uid 99); 9 Jan 2015 17:37:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 17:37:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: encountered temporary error during SPF processing of domain of lance@iodinesoftware.com) Received: from [209.85.212.173] (HELO mail-wi0-f173.google.com) (209.85.212.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 17:36:56 +0000 Received: by mail-wi0-f173.google.com with SMTP id r20so3748214wiv.0 for ; Fri, 09 Jan 2015 09:35:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=6RDpOijFav7rqYYrO/xfx1Emih+Ps9dLhiFXuKv6M2c=; b=krop6judlLqH8uF99+ReQ9cEzmTCg+q7jgk4pwfTwr5eKVIRHKl00PkWCazpYE7LDv PYEP3dmOLJXxl5ht/4o5Zkuo21DKRGJtIWT3mk2wdDyuZYtkUbsFoGs6yAyz6AAUKbCf t8M8es40Mfd5YyCTktp/5gX6NyrTz02Gc6FS0vnVP0xvOqWoaUaHaicbNju2hWVdS0lq Hrk+OrMTJif32CtAnSkStPKkUr89WoTgf9bavgFpO61i3QEtQfjkktcORN++/QpWexc7 CUDMrGIGUm6j65UzviRsw1Q7GA61hMapcydhhKX9BY0ErPFzq6956r0TXzp4dtDGpabk aubg== X-Gm-Message-State: ALoCoQkhJYdUCMT3s/DIsmhMviHPR0iP3crqDiMij459z71ofNYF7lHVCUtAKr5xmJq9ghx9YF8u X-Received: by 10.180.211.2 with SMTP id my2mr7054549wic.3.1420824949604; Fri, 09 Jan 2015 09:35:49 -0800 (PST) MIME-Version: 1.0 Received: by 10.217.148.10 with HTTP; Fri, 9 Jan 2015 09:35:29 -0800 (PST) From: Lance Eason Date: Fri, 9 Jan 2015 11:35:29 -0600 Message-ID: Subject: How to train new models for the ClearTK based assertion analyzers? To: user@ctakes.apache.org Content-Type: multipart/alternative; boundary=001a11c37c1868d92b050c3b9649 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37c1868d92b050c3b9649 Content-Type: text/plain; charset=UTF-8 The new ClearTK assertion analyzers in 3.2.1 (GenericCleartkAnalysisEngine, HistoryCleartkAnalysisEngine, etc.) are a welcome change from the perspective that they're much, much faster than the previous MedFacts implementation. Unfortunately though I'm finding them significantly less accurate at actually flagging the assertion attributes correctly. Could someone point me in the direction of how to train new models? I've found org.apache.ctakes.assertion.train.TrainAttributeModels which looks promising but I can't find the current training data anywhere (I'd like to use it as the starting point) and without samples have no idea what the format it's expecting is. Some real world examples from clinical notes: I would have expected each of the following to be 'generic' (or maybe 'conditional') as they're referring to a hypothetical future problem. (To be fair the previous implementation did no better on these): - Gel foam cushion is also required in order to prevent pressure ulcers from forming, as patient will spend many hours of the day in chair. - Ordered wound care protocol and applied mepliex to bottom to prevent skin breakdown. - Encouraged patient to shift weight more frequently to prevent any pressure ulcers. - Showed patient how to use foam to avoid pressure ulcers. - Gave her pamphlet about pressure ulcers. - I educated patient on causes and prevention of pressure ulcers. - Instructed on need to adjust position Q2 hours to avoid pressure ulcers. This statement is about as direct an instance of negation as possible and is instead ruled as conditional by the new models instead of negated: - Patient does not have pneumonia. "Ruled out" is no longer understood as negation: - Ruled out pneumonia. "h/o" is no longer understood as historic ("hx" and "history of" are still picked up): - h/o heart failure. Trailing conditionals are no longer picked up: - "Likely tuberculosis" (found) vs. "Tuberculosis likely" (missed) - "Possible tuberculosis" (found" vs. "Tuberculosis is possible" (missed) --001a11c37c1868d92b050c3b9649 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The new ClearTK assertion analyzers in 3.2.1 (GenericClear= tkAnalysisEngine, HistoryCleartkAnalysisEngine, etc.) are a welcome change = from the perspective that they're much, much faster than the previous M= edFacts implementation.=C2=A0 Unfortunately though I'm finding them sig= nificantly less accurate at actually flagging the assertion attributes corr= ectly.

Could someone point me in the direction of how to= train new models?=C2=A0 I've found org.apache.ctakes.assertion.train.T= rainAttributeModels which looks promising but I can't find the current = training data anywhere (I'd like to use it as the starting point) and w= ithout samples have no idea what the format it's expecting is.


Some real world examples from clinical note= s:

I would have expected each of the following to = be 'generic' (or maybe 'conditional') as they're referr= ing to a hypothetical future problem. =C2=A0(To be fair the previous implem= entation did no better on these):
=C2=A0 =C2=A0- Gel foa= m cushion is also required in order to prevent pressure ulcers from forming= , as patient will spend many hours of the day in chair.
=C2= =A0 =C2=A0- Ordered wound care protocol and applied mepliex to bottom to pr= event skin breakdown.
=C2=A0 =C2=A0- Encouraged patient to shift = weight more frequently to prevent any pressure ulcers.
=C2=A0 =C2= =A0- Showed patient how to use foam to avoid pressure ulcers.
=C2= =A0 =C2=A0- Gave her pamphlet about pressure ulcers.
=C2=A0 =C2= =A0- I educated patient on causes and prevention of pressure ulcers.
<= div>=C2=A0 =C2=A0- Instructed on need to adjust position Q2 hours to avoid = pressure ulcers.

This statement is about as = direct an instance of negation as possible and is instead ruled as conditio= nal by the new models instead of negated:
=C2=A0 =C2=A0-=C2=A0Pat= ient does not have pneumonia.

"Ruled out" is no longer understood as negation:
=
=C2=A0 =C2=A0-=C2=A0Ruled out pneumonia.

&= quot;h/o" is no longer understood as historic ("hx" and &quo= t;history of" are still picked up):
=C2=A0 =C2=A0-=C2=A0h/o heart failure.

Trailing conditionals are no lon= ger picked up:
=C2=A0 =C2=A0- "Lik= ely tuberculosis" (found) vs. "Tuberculosis likely" (missed)=
=C2=A0 =C2=A0- "Possible tubercul= osis" (found" vs. "Tuberculosis is possible" (missed)


--001a11c37c1868d92b050c3b9649--