Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2AA4105EC for ; Sun, 22 Mar 2015 16:12:42 +0000 (UTC) Received: (qmail 51172 invoked by uid 500); 22 Mar 2015 16:12:42 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 51108 invoked by uid 500); 22 Mar 2015 16:12:42 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 51097 invoked by uid 99); 22 Mar 2015 16:12:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Mar 2015 16:12:42 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Guergana.Savova@childrens.harvard.edu designates 134.174.20.73 as permitted sender) Received: from [134.174.20.73] (HELO mailsmtp3.childrenshospital.org) (134.174.20.73) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Mar 2015 16:12:38 +0000 Received: from pps.filterd (mailsmtp3.childrenshospital.org [127.0.0.1]) by mailsmtp3.childrenshospital.org (8.15.0.59/8.15.0.59) with SMTP id t2MGA09o012756; Sun, 22 Mar 2015 12:12:06 -0400 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp3.childrenshospital.org with ESMTP id 1t935cctr8-1 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 22 Mar 2015 12:12:06 -0400 Received: from pps.filterd (smtpndc2.chboston.org [127.0.0.1]) by smtpndc2.chboston.org (8.15.0.59/8.15.0.59) with SMTP id t2MGBKIZ012635; Sun, 22 Mar 2015 12:12:05 -0400 Received: from chexhubcasbdc1.chboston.org (chexhubcasbdc1.chboston.org [10.20.18.71]) by smtpndc2.chboston.org with ESMTP id 1t92kqsx9h-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Sun, 22 Mar 2015 12:12:05 -0400 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCASBDC1.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Sun, 22 Mar 2015 12:12:05 -0400 From: "Savova, Guergana" To: "dev@ctakes.apache.org" CC: Rohit Shinde Subject: RE: Medical de-identification Thread-Topic: Medical de-identification Thread-Index: AQHQZLkpABA5Jy3T8kic7YvW7Zzn8Z0orEOg Date: Sun, 22 Mar 2015 16:12:04 +0000 Message-ID: References: <26C83BC2-6054-4182-A8A9-D3FA282F9B5C@wiredinformatics.com> In-Reply-To: <26C83BC2-6054-4182-A8A9-D3FA282F9B5C@wiredinformatics.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.18.21.55] Content-Type: multipart/alternative; boundary="_000_E5A9FA5ABBF1CA4085D4F0794852A51E346FCE42CHEXMBX3ACHBOST_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-03-22_03:,, signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-03-22_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=7.93809462606987e-15 compositescore=0.626299392629014 phishscore=0 kscore.is_spamscore=0 rbsscore=0.626299392629014 recipient_to_sender_totalscore=0 spamscore=0 urlsuspectscore=0.0262993926290142 adultscore=0 kscore.compositescore=1 circleOfTrustscore=0 suspectscore=0 recipient_domain_to_sender_totalscore=0 bulkscore=0 recipient_domain_to_sender_domain_totalscore=0 recipient_to_sender_domain_totalscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1502090000 definitions=main-1503220176 X-Virus-Checked: Checked by ClamAV on apache.org --_000_E5A9FA5ABBF1CA4085D4F0794852A51E346FCE42CHEXMBX3ACHBOST_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Agreed - sounds very good! --guergana From: britt fitch [mailto:britt.fitch@wiredinformatics.com] Sent: Sunday, March 22, 2015 11:59 AM To: dev@ctakes.apache.org Cc: Rohit Shinde Subject: Re: Medical de-identification Sounds good. Starting with some references: Docs: https://open.med.harvard.edu/wiki/display/SCRUBBER/3.X Publication: http://www.biomedcentral.com/1472-6947/13/112/abstract (check out the supplemental material as well fo= r additional details on running and improvements) SVN (old, standalone, Scrubber v.3.x): https://open.med.harvard.edu/wiki/di= splay/SCRUBBER/Software SVN (initial apache port to ctakes sandbox): https://svn.apache.org/repos/a= sf/ctakes/sandbox/ctakes-scrubber-deid/ The project started off as a standalone process and became a UIMA pipeline = (outside of ctakes). The plan had always been to port this to an optional ctakes module but we n= ever got that fully implemented. Some of the parts that need the most attention to get going: * working with the ctakes type system * pulling out weka (ML lib) for an asf 2.0 friendly lib instead * simpler process for building the models. Regarding knowledge, its good to be familiar with java, UIMA, decision tree= s, and ctakes. Likely in that order. While this is still in the sandbox and you are still getting familiar with = running it as a standalone app feel free to ping me and andy off-list if th= ats more convenient. Then we can definitely bring it back to the dev list while getting it runni= ng in ctakes. Cheers, Britt Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com Britt.Fitch@wiredinformatics.com On Mar 20, 2015, at 7:57 PM, andy mcmurry > wrote: Britt et al: here is a student named rohit interested in getting the deidentification pipeline running again. Hoping there is still interest in getting this going in ctakes for real. Comments? ---------- Forwarded message ---------- From: "Rohit Shinde" > Date: Mar 20, 2015 5:02 AM Subject: Re: Medical de-identification To: "andy mcmurry" > Cc: I would certainly be interested into "production grade code". The project also sounds interesting. How do I start working on it? I know Java well. What else would I need to know before starting on this project? On Fri, Mar 20, 2015 at 12:44 PM, andy mcmurry > wrote: Yes, the project is in Java, the code was written for a research project and never made into "production grade code". If you are interested, we would like to turn the scrubber into a solid pipeline. Java programming 100%, with Colt statistical library On Mar 19, 2015 7:52 PM, "Rohit Shinde" > wrote: Hi Andy, Could you please tell me more about that project? I would really like a reply. Thank you, Rohit Shinde On Wed, Mar 18, 2015 at 5:51 PM, Rohit Shinde < rohit.shinde12194@gmail.com> wrote: Hi Andy, I am interested in medical de-identification. I would like to know what this project consists of. Is it partially implemented, or does the implementation need to start? What languages would I need to know? What theoretical background would I need? Also, how complex would this task be? What parts of OpenNLP does this project use? Thank you, Rohit Shinde --_000_E5A9FA5ABBF1CA4085D4F0794852A51E346FCE42CHEXMBX3ACHBOST_--