Return-Path: X-Original-To: apmail-ctakes-dev-archive@www.apache.org Delivered-To: apmail-ctakes-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 221A410C35 for ; Tue, 9 Dec 2014 19:57:37 +0000 (UTC) Received: (qmail 33097 invoked by uid 500); 9 Dec 2014 19:57:36 -0000 Delivered-To: apmail-ctakes-dev-archive@ctakes.apache.org Received: (qmail 33037 invoked by uid 500); 9 Dec 2014 19:57:36 -0000 Mailing-List: contact dev-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list dev@ctakes.apache.org Received: (qmail 33015 invoked by uid 99); 9 Dec 2014 19:57:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Dec 2014 19:57:36 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Timothy.Miller@childrens.harvard.edu designates 134.174.20.74 as permitted sender) Received: from [134.174.20.74] (HELO mailsmtp6.childrenshospital.org) (134.174.20.74) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Dec 2014 19:57:10 +0000 Received: from pps.filterd (mailsmtp6.childrenshospital.org [127.0.0.1]) by mailsmtp6.childrenshospital.org (8.14.7/8.14.7) with SMTP id sB9Jsq19032285 for ; Tue, 9 Dec 2014 14:57:07 -0500 Received: from smtpndc2.chboston.org (smtpndc2.chboston.org [10.20.50.105]) by mailsmtp6.childrenshospital.org with ESMTP id 1r3gxqtw98-1 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 09 Dec 2014 14:57:06 -0500 Received: from pps.filterd (smtpndc2.chboston.org [127.0.0.1]) by smtpndc2.chboston.org (8.14.7/8.14.7) with SMTP id sB9Js14E031089 for ; Tue, 9 Dec 2014 14:57:05 -0500 Received: from chexhubcasbdc2.chboston.org (chexhubcasbdc2.chboston.org [10.20.18.93]) by smtpndc2.chboston.org with ESMTP id 1qsnscm1xt-3 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 09 Dec 2014 14:57:05 -0500 Received: from CHEXMBX3A.CHBOSTON.ORG ([fe80::8df1:9966:b0b0:841d]) by CHEXHUBCASBDC2.CHBOSTON.ORG ([::1]) with mapi id 14.03.0169.001; Tue, 9 Dec 2014 14:57:05 -0500 From: "Miller, Timothy" To: "dev@ctakes.apache.org" Subject: Re: Scaling cTakes Thread-Topic: Scaling cTakes Thread-Index: AdAQoIFU4xOOls7KT6+MTz2Bp5zQQA== Date: Tue, 9 Dec 2014 19:57:04 +0000 Message-ID: References: <902520A33B138342887CFABA41FC6C7093A26267@GHSEXMBX1W8K1V.geisinger.edu> <393252F14C42F946952F1ED75D316CAD391C7759@CHEXMBX4A.CHBOSTON.ORG>,<902520A33B138342887CFABA41FC6C7093A28FBD@GHSEXMBX1W8K1V.geisinger.edu> <393252F14C42F946952F1ED75D316CAD391C7971@CHEXMBX4A.CHBOSTON.ORG> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.218] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2014-12-09_05:2014-12-09,2014-12-09,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.13.68,1.0.33,0.0.0000 definitions=2014-12-09_05:2014-12-09,2014-12-09,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1412090194 X-Virus-Checked: Checked by ClamAV on apache.org Brandon, depending on your use case you may be able to remove the=0A= dependency parser and constituency parser from the pipeline. This might=0A= also reduce the memory footprint substantially and make multi-jvm=0A= pipelines possible.=0A= Tim=0A= =0A= =0A= On 12/09/2014 01:28 PM, Finan, Sean wrote:=0A= > Hi Brandon,=0A= >=0A= > You are welcome. I was hoping that you'd get the note processing time do= wn to under a second with the different lookup, but I guess not. I think t= hat any optimization from here really depends upon what information you wan= t to extract from the notes.=0A= >=0A= > Sean=0A= > ________________________________________=0A= > From: Geise, Brandon D. [bdgeise@geisinger.edu]=0A= > Sent: Tuesday, December 09, 2014 9:13 AM=0A= > To: dev@ctakes.apache.org=0A= > Subject: RE: Scaling cTakes=0A= >=0A= > Thanks again Sean for the advice. Just by changing the pipeline to use t= he fast dictionary led to quadrupling the processing speed. Any other sugg= estions on performance tuning would be great!=0A= >=0A= > Thanks,=0A= > Brandon=0A= >=0A= > -----Original Message-----=0A= > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]=0A= > Sent: Friday, December 05, 2014 1:14 PM=0A= > To: dev@ctakes.apache.org=0A= > Subject: RE: Scaling cTakes=0A= >=0A= > Hi Brandon,=0A= >=0A= > It sounds like you've got a decent pipeline set up. To increase the spe= ed you could try swapping out use of ctakes-dictionary-lookup with ctakes-d= ictionary-lookup-fast in the AE. Check ctakes-clinical-pipeline/desc/[ae]/= AggregatePlaintextFastUMLSProcessor.xml for an example. As for the CASPool= , I don't think that it will make any difference for cTakes.=0A= >=0A= > Sean=0A= > ________________________________________=0A= > From: Geise, Brandon D. [bdgeise@geisinger.edu]=0A= > Sent: Friday, December 05, 2014 12:40 PM=0A= > To: dev@ctakes.apache.org=0A= > Subject: Scaling cTakes=0A= >=0A= > Hi,=0A= >=0A= > I'm new to cTakes and the UIMA framework. I've read most of the UIMA doc= umentation and was able to take the BagofCUIGenerator example and modify to= read notes from a DB, process using the UMLS AE in the clinical-pipeline u= sing a local DB version of UMLS, and output the CUIs to a DB. However, the= problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I was h= oping I could get some hints or advice on speeding the process up. I read = there's a patch for LVG, but wasn't quite sure how to implement. Also from= testing using the CPE GUI, I don't notice any different in processing time= by adjusting the CASPool setting. Some advice on the CASPool would be app= reciated also.=0A= >=0A= > Thanks,=0A= > Brandon=0A= >=0A= >=0A= > IMPORTANT WARNING: The information in this message (and the documents att= ached to it, if any) is confidential and may be legally privileged. It is i= ntended solely for the addressee. Access to this message by anyone else is = unauthorized. If you are not the intended recipient, any disclosure, copyin= g, distribution or any action taken, or omitted to be taken, in reliance on= it is prohibited and may be unlawful. If you have received this message in= error, please delete all electronic copies of this message (and the docume= nts attached to it, if any), destroy any hard copies you may have created a= nd notify me immediately by replying to this email. Thank you.=0A= >=0A= > Geisinger Health System utilizes an encryption process to safeguard Prote= cted Health Information and other confidential data contained in external e= -mail messages. If email is encrypted, the recipient will receive an e-mail= instructing them to sign on to the Geisinger Health System Secure E-mail M= essage Center to retrieve the encrypted e-mail.=0A= >=0A= >=0A= =0A=