Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A4FA811034 for ; Thu, 24 Apr 2014 21:42:02 +0000 (UTC) Received: (qmail 76978 invoked by uid 500); 24 Apr 2014 21:42:01 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 76934 invoked by uid 500); 24 Apr 2014 21:42:01 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 76885 invoked by uid 99); 24 Apr 2014 21:42:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Apr 2014 21:42:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Timothy.Miller@childrens.harvard.edu designates 134.174.20.73 as permitted sender) Received: from [134.174.20.73] (HELO mailsmtp3.childrenshospital.org) (134.174.20.73) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Apr 2014 21:41:56 +0000 Received: from pps.filterd (mailsmtp3.childrenshospital.org [127.0.0.1]) by mailsmtp3.childrenshospital.org (8.14.5/8.14.5) with SMTP id s3OKxAYQ005284 for ; Thu, 24 Apr 2014 17:41:28 -0400 Received: from smtpndc1.chboston.org (smtpndc1.chboston.org [10.20.50.104]) by mailsmtp3.childrenshospital.org with ESMTP id 1kf7y5snmj-1 for ; Thu, 24 Apr 2014 17:41:28 -0400 Received: from pps.filterd (smtpndc1.chboston.org [127.0.0.1]) by smtpndc1.chboston.org (8.14.5/8.14.5) with SMTP id s3OKZZcn006581 for ; Thu, 24 Apr 2014 17:41:28 -0400 Received: from chexhubcasbdc1.chboston.org (chexhubcasbdc1.chboston.org [10.20.18.71]) by smtpndc1.chboston.org with ESMTP id 1k4r4p55av-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Thu, 24 Apr 2014 17:41:28 -0400 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCASBDC1.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Thu, 24 Apr 2014 17:41:28 -0400 From: "Miller, Timothy" To: "dev@ctakes.apache.org" Subject: Re: suggestion for default pipelines Thread-Topic: suggestion for default pipelines Thread-Index: Ac9Ysf3ILp3kN1O0QBidCDkO3hyXvA== Date: Thu, 24 Apr 2014 21:41:28 +0000 Message-ID: References: <400F3A89-0AB6-4596-86D0-8A37144C1292@apache.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.218] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-04-24_05:2014-04-24,2014-04-24,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000 definitions=2014-04-24_05:2014-04-24,2014-04-24,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1404240324 X-Virus-Checked: Checked by ClamAV on apache.org Any preference for separate factory classes:=0A= =0A= class SentenceDetectorAnnotatorFactory:=0A= =0A= static AnalysisEngineDescription getSentenceDetectorAnnotator()=0A= =0A= VS=0A= =0A= static methods added to primitive annotators:=0A= =0A= class SentenceDetector (existing)=0A= =0A= static AnalysisEngineDescription getSentenceDetectorAnnotator()=0A= =0A= ?=0A= =0A= The former can clutter up the class space while the latter extends the=0A= length of classes, especially if there are multiple versions=0A= (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),=0A= getMeshDictionaryAnnotator(), etc.)=0A= =0A= Tim=0A= =0A= On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:=0A= > It would be nice if uimaFIT provided a Maven plugin to automatically=0A= > generate descriptors for aggregates. Maybe if we come up with a =0A= > convention for factories, e.g. a "class with static methods that do=0A= > not take any parameters and that return descriptors", or "methods=0A= > that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"=0A= > it should be possible to implement such a Maven plugin.=0A= >=0A= > Cheers,=0A= >=0A= > -- Richard=0A= >=0A= > On 16.04.2014, at 05:21, Steven Bethard wrote:= =0A= >=0A= >> +1. And note that once you have a descriptor, you can generate the=0A= >> XML, so we should arrange to replace the current XML descriptors with=0A= >> ones generated automatically from the uimaFIT code. That should reduce= =0A= >> some synchronization problems when the Java code was changed but the=0A= >> XML descriptor was not.=0A= >>=0A= >> Steve=0A= >>=0A= >> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy=0A= >> wrote:=0A= >>> The discussion in the other thread with Abraham Tom gave me an idea I= =0A= >>> wanted to float to the list. We have been using some UIMAFit pipeline= =0A= >>> builders in the temporal project that maybe could be moved into=0A= >>> clinical-pipeline. For example, look to this file:=0A= >>>=0A= >>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java= /org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.j= ava?view=3Dmarkup=0A= >>>=0A= >>> with the static methods getPreprocessorAggregateBuilder() and=0A= >>> getLightweightPreprocessorAggregateBuilder() [no umls].=0A= >>>=0A= >>> So my idea would be to create a class in clinical-pipeline=0A= >>> (CTakesPipelines) with static methods for some standard pipelines (to= =0A= >>> return AnalysisEngineDescriptions instead of AggregateBuilders?):=0A= >>>=0A= >>> getStandardUMLSPipeline() -- builds pipeline currently in=0A= >>> AggregatePlaintextUMLSProcessor.xml=0A= >>> getFullPipeline() -- same as above but with SRL, constituency parsing,= =0A= >>> etc., every component in ctakes=0A= >>>=0A= >>> We could then potentially merge our entry points -- I think Abraham's= =0A= >>> experience points out that this is currently confusing, as well as=0A= >>> probably not implemented optimally. For example, either=0A= >>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static=0A= >>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate= =0A= >>> our xml descriptors too unless people feel strongly about keeping those= =0A= >>> around.=0A= >>>=0A= >>> Another benefit is that the cTAKES API is then trivial -- if you import= =0A= >>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:= =0A= >>>=0A= >>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());=0A= >>>=0A= >>>=0A= >>> I think this would actually be pretty easy to implement, but hoping to= =0A= >>> get some feedback on whether this is a good direction.=0A= >>>=0A= >>> Tim=0A= >=0A= =0A= -- =0A= Tim Miller=0A= Instructor=0A= Boston Children's Hospital and Harvard Medical School=0A= timothy.miller@childrens.harvard.edu=0A= 617-919-1223=0A= =0A=