ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Thu, 30 Jul 2015 14:07:37 GMT
Hi Ted/Jay,
Thanks for suggesting and taking this up….
What information will be needed to accomplish what you were thinking?
Just thinking aloud here:

1)      Test data.  I think John Green crafted about 20-30 notes in the data folder.  We can
use this as a starting point.

2)      Code to run though the various components and pipelines?

3)      Environments to run thru different O/S/hardware, etc.?

4)      Create a Gold Standard format (Knowtator and/or Anafora).  cTAKES already has existing
readers for those. [For ML based examples?]

I think there is an ctakes-regression project that we can probably just overwrite for new
regression testing code.

From: Ted Strall [mailto:tstrall@yahoo.com]
Sent: Thursday, July 30, 2015 9:21 AM
To: Chen, Pei; dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical
Narratives

How / when can we go about getting started on this?

________________________________
From: "Chen, Pei" <Pei.Chen@childrens.harvard.edu<mailto:Pei.Chen@childrens.harvard.edu>>
To: "dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>" <dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>>;
Ted Strall <tstrall@yahoo.com<mailto:tstrall@yahoo.com>>
Sent: Friday, July 24, 2015 12:52 PM
Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical
Narratives

Ted- Welcome to the community!
I think this would be a great enhancement.
Jay- I think the BigTop folks did a lot with the smoke and integration tests... Do you how
they did it? Something we can reuse?
--Pei


-----Original Message-----
From: Ted Strall [mailto:tstrall@yahoo.com.INVALID<mailto:tstrall@yahoo.com.INVALID>]
Sent: Friday, July 24, 2015 12:31 PM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical
Narratives

I would be interested in helping to develop / maintain a regression testing framework for
that.
I'm new to ctakes (and just recently started stalking the dev mailing list) but I've been
a software engineer for 20 years and have done a lot of framework automation stuff that will
probably be required. As I write this, I am working on an automated integration test that
will run on Jenkins that fires up and load an h2 database, a solr instance, an in-house indexing
pipeline and an in-house search service, indexes 10k documents and executes and evaluates
some canned queries before shutting itself down.
I'm also working on a MS in Predictive Analytics and I am interested in applying machine learning
and NLP to medical informatics, so I would welcome the chance to get dirty with that side
of stuff, also.
      From: Jay Vyas <jayunit100.apache@gmail.com<mailto:jayunit100.apache@gmail.com>>
To: "dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>" <dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>>
Sent: Friday, July 24, 2015 10:44 AM
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical
Narratives

Yes this is very interesting work.

-  If we have access to a large corpus of de identified records we can recession test the
ctakes platform.

- I can help collaborate on a regression testing framework if someone else wants to help Maintain
it.



> On Jul 24, 2015, at 11:12 AM, Pei Chen <chenpei@apache.org<mailto:chenpei@apache.org>>
wrote:
>
> Hi,
> Re:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.
> com_science_article_pii_S1532046415001392&d=BQIFaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5
> WY&m=IdFJ0ChLqz9-dg435_5Rea2_0EUPNDw0uCUKnNp_N7k&s=DOgavsLa7IIU0rgq8lx
> DXTb33J8-4zgCWuKzL83CZyw&e= This is very interesting work and I think


> it would be very valuable for the general community.  Is this
> something that you may be in interested in contributing/sharing the
> code with the Apache cTAKES community?
> Thanks,
> Pei

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message