Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E110C9DBA for ; Tue, 5 Jun 2012 15:49:24 +0000 (UTC) Received: (qmail 98197 invoked by uid 500); 5 Jun 2012 15:49:24 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 98021 invoked by uid 500); 5 Jun 2012 15:49:24 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 98013 invoked by uid 99); 5 Jun 2012 15:49:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 15:49:24 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Pei.Chen@childrens.harvard.edu designates 134.174.13.92 as permitted sender) Received: from [134.174.13.92] (HELO mailsmtp2.childrenshospital.org) (134.174.13.92) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 15:49:18 +0000 Received: from pps.filterd (mailsmtp2 [127.0.0.1]) by mailsmtp2.childrenshospital.org (8.14.5/8.14.5) with SMTP id q55FjqjW006703 for ; Tue, 5 Jun 2012 11:48:56 -0400 Received: from smtpndc1.chboston.org (smtpndc1.chboston.org [10.20.50.104]) by mailsmtp2.childrenshospital.org with ESMTP id 159010bf83-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 05 Jun 2012 11:48:56 -0400 Received: from pps.filterd (smtpndc1 [127.0.0.1]) by smtpndc1.chboston.org (8.14.5/8.14.5) with SMTP id q55FjDv5029036 for ; Tue, 5 Jun 2012 11:48:56 -0400 Received: from chexhubcas3.chboston.org (Internal-NDC-NAT-V1260.tch.harvard.edu [10.20.50.4]) by smtpndc1.chboston.org with ESMTP id 159k000k2b-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 05 Jun 2012 11:48:56 -0400 Received: from CHEXMBX1A.CHBOSTON.ORG ([fe80::3c05:8ca9:55a6:f320]) by CHEXHUBCAS3.CHBOSTON.ORG ([::1]) with mapi id 14.01.0355.002; Tue, 5 Jun 2012 11:48:55 -0400 From: "Chen, Pei" To: "general@incubator.apache.org" Subject: RE: [VOTE] [PROPOSAL] cTAKES for the Apache Incubator Thread-Topic: [VOTE] [PROPOSAL] cTAKES for the Apache Incubator Thread-Index: Ac1DJbS/fX17GRC/RLaBjXdm2IbDLQACwQ5gAABq7AA= Date: Tue, 5 Jun 2012 15:48:55 +0000 Message-ID: <924DE05C19409B438EB81DE683A942D91ADFD4@CHEXMBX1A.CHBOSTON.ORG> References: <924DE05C19409B438EB81DE683A942D91ADF4D@CHEXMBX1A.CHBOSTON.ORG> In-Reply-To: <924DE05C19409B438EB81DE683A942D91ADF4D@CHEXMBX1A.CHBOSTON.ORG> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.137] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.6.7580,1.0.260,0.0.0000 definitions=2012-06-05_06:2012-05-21,2012-06-05,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.6.7580,1.0.260,0.0.0000 definitions=2012-06-05_06:2012-05-21,2012-06-05,1970-01-01 signatures=0 Including the original Proposal raw text below this time: Hi, We are proposing cTAKES to be an Apache Incubator Project and would like to= request that the IPMC vote for cTAKES to join the Incubator. Below, you will find the original proposal and details. Please cast your vote: [ ] +1 to recommend cTAKES to be an Apache Incubator Project [ ] 0 don't ca= re [ ] -1 no, don't recommend yet, (because...) Thanks, Pei =3D cTAKES Proposal =3D The following is a proposal for a new top-level project within the ASF. =3D=3D Abstract =3D=3D cTAKES: (clinical Text Analysis and Knowledge Extraction System) is an natu= ral language processing tool for information extraction from electronic med= ical record clinical free-text. =3D=3D Proposal =3D=3D cTAKES comprises a collection of components and tooling written in Java spe= cifically trained for the clinical domain, and creates rich linguistic and = semantic annotations that can be utilized by clinical decision support syst= ems & clinical research.=20 =3D=3D Background =3D=3D The development of cTAKES started in 2006 by a team of physicians, computer= scientists and software engineers at the Mayo Clinic. The development team= was led by Dr. Guergana Savova & Dr. Christopher Chute. cTAKES is released= open source under an Apache v2.0 license. This system was deployed at Mayo= and is currently an integral part of their clinical data management infras= tructure and has processed in excess of 80 million clinical notes. Currentl= y, the core development team is co-located at Mayo Clinic and Children's Ho= spital Boston following Dr. Savova's move to Children's Hospital Boston in = early 2010. Additional collaborations with external groups at University of= Colorado, Brandeis University, University of Pittsburgh, University of Cal= ifornia at San Diego continue to extend the capabilities of cTAKES into are= as such Temporal Reasoning, Clinical Question and Answering, and coreferenc= e resolution for the clinical domain. In 2010, cTAKES was adopted by the I2= B2 program and is a central component of the SHARP Area 4. The current cTAK= ES components include: * Sentence boundary detector * Rule-based tokenizer to separate punctuations from words * Normalizer * Context dependent tokenizer * Part-of-speech tagger * Phrasal chunker * Dictionary lookup annotator and normalization to an ontology * Context annotator * Negation detector * Dependency parser * Constituency parser * Semantic Role Labeler * Coreference resolver * Module for the identification of patient smoking status * Drug mention annotator =3D=3D Rationale =3D=3D We believe there is a clear gap between cutting edge technologies developed= out of research labs and in the clinical practice. We believe that moving = cTAKES development to the Apache development community will lead to faster = innovation, better integration with other open source software, and broader= adoption of cTAKES within clinical institutions and improve our healthcare= system. We believe that having cTAKES on Apache will encourage the develop= ment of a basic set of open source components that will jumpstart these dev= elopers' efforts. =3D=3D Initial Goals =3D=3D The initial goals of the proposed project are: * Bring the community together at the ASF and make the development process= transparent for them * Write user documentation about all major components * Automated build/continuous integration * Automate regression tests * Produce an Incubating release =3D=3D Current Status =3D=3D =3D=3D=3D Meritocracy =3D=3D=3D Some of the initial committers are familiar with Apache's idea of meritocra= cy, others aren't. We will get everybody on the same level as part of the i= ncubation process. =3D=3D=3D Community =3D=3D=3D cTAKES already has a considerable user base, both in industry and academia. =3D=3D=3D Core Developers =3D=3D=3D See the initial committer list. =3D=3D=3D Alignment =3D=3D=3D cTAKES has tie-ins with several existing Apache projects. We have been buil= ding our components using the UIMA framework. We are also reusing existing = Apache projects such as Lucene, Solr, Maven. We expect these collaborations= to strengthen further after our move to Apache and experiment with other p= rojects under the Lucene umbrella such as Hadoop and Mahout. Another obviou= s connection exists to some of the projects under the OpenNLP umbrella. =3D=3D Known Risks =3D=3D =3D=3D=3D Orphaned products =3D=3D=3D The project has been around for quite a number of years already, it has a w= ell-established user community and a diverse set of committers. =3D=3D=3D Inexperience with Open Source =3D=3D=3D cTAKES has been an open source project for many years. Many of the develope= rs are already familiar with both open source in general and the ASF in par= ticular. =3D=3D=3D Homogenous Developers =3D=3D=3D The current group of developers is very diverse and spans globally and acro= ss multiple institutions. =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D Most of the developers are not paid to work specifically on cTAKES, so ther= e is little reliance on salaried developers. =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D NLP is often used in search and other algorithms that work with unstructure= d data, thus cTAKES is likely to be useful to the Lucene and Solr communiti= es. It also aligns nicely with both Mahout and UIMA as well as OpenNLP. =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D We think the project aligns nicely with the goals of the ASF to disseminate= source code to the public free of charge. Clinical NLP has long been the s= ubject of cutting edge research, but is often lacking in community and shar= ed knowledge. We believe that by bringing cTAKES to the ASF, the Apache bra= nd will help deliver clinical NLP capabilities to a much larger audience an= d likewise a cutting edge project like cTAKES can further the ASF brand by = providing users with tried and true, as well as new, natural language proce= ssing capabilities. =3D=3D Documentation =3D=3D * https://wiki.nci.nih.gov/display/VKC/cTAKES+2.0 * http://en.wikipedia.org/wiki/CTAKES =3D=3D Initial Source =3D=3D The source code is maintained in SVN on SourceForge: cTAKES: http://sourcef= orge.net/projects/ohnlp/ =3D=3D Source and Intellectual Property Submission Plan =3D=3D The cTAKES source code is already open source under the AL 2.0. =3D=3D External Dependencies =3D=3D ||'''Library''' ||||'''License''' ||||'''Description''' || ||libsvm ||||BSD ||||Machine Learning Library || ||UIMA ||||AL 2.0 ||||Unstructured Information Management Architecture || ||Lucene Core ||||AL 2.0 ||||Plain Text Search Engine Library || ||OpenNLP||||AL 2.0 ||||General Purpose Natural Language Processing Library|| ||HSQLDB||||BSD||||In Memory DB|| ||JDOM||||Apache Style||||Java XML Manipulation Libraryv|| ||Open AI FSM||||Apache Style||||Finite State Machines Toolset|| =3D=3D Cryptography =3D=3D cTAKES neither provides nor uses any cryptography. =3D=3D Required Resources =3D=3D =3D=3D=3D Mailing lists =3D=3D=3D * ctakes-dev * ctakes-private * ctakes-user * ctakes-commits =3D=3D=3D Subversion Directory =3D=3D=3D https://svn.apache.org/repos/asf/incubator/ctakes =3D=3D=3D Issue Tracking =3D=3D=3D Jira: cTAKES =3D=3D=3D Other Resources =3D=3D=3D =3D=3D Initial Committers =3D=3D ||'''Name''' ||||'''Email''' ||||'''CLA''' || ||Pei J Chen ||||pei.chen@childrens.harvard.e= du ||||yes || ||Sean Finan ||||sean.finan@childrens.harvard= .edu ||||no || ||Guergana K. Savova ||||guergana.savova@chil= drens.harvard.edu ||||no || ||James J Masanz ||||masanz.james@mayo.edu ||= ||no || =3D=3D Affiliations =3D=3D =3D=3D Sponsors =3D=3D =3D=3D=3D Champion =3D=3D=3D J=F6rn Kottmann =3D=3D=3D Nominated Mentors =3D=3D=3D * J=F6rn Kottmann=20 * Grant Ingersoll * Chris A Mattmann =3D=3D=3D Sponsoring Entity =3D=3D=3D The Apache Incubator On 05/30/2012 11:59 PM, Chen, Pei wrote: > Hi All, > > We would like to propose cTAKES to be an Apache Incubator project. > > cTAKES: (clinical Text Analysis and Knowledge Extraction System) is an na= tural language processing tool for information extraction from electronic m= edical record clinical free-text. Additional information is available at h= ttp://en.wikipedia.org/wiki/CTAKES and https://wiki.nci.nih.gov/display/VKC= /cTAKES+2.5 . > > The draft proposal document is available at=20 > http://wiki.apache.org/incubator/cTAKESProposal > > We're excited about the opportunity to work with ASF and the community to= create an Incubator project for Natural Language Processing for the clinic= al domain. We'll welcome all feedback on the proposal. > > Thanks. > > > > --- > Pei Chen > Lead Application Development Specialist Childrens Hospital Boston /=20 > Harvard Medical School > 300 Longwood Avenue, Enders 142 > Boston, MA 02115 > tel: (617) 919-4423 > fax: (617) 730-0057 > Pei.Chen@childrens.harvard.edu > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org