ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jay vyas <jayunit100.apa...@gmail.com>
Subject Re: Scaling cTakes
Date Fri, 05 Dec 2014 19:40:15 GMT
on a tangential note, we do have example of running ctakes in a massively
parallel system like spark/hadoop.

https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/

if you're problem is embarrasingly parallelizable, you can use
mapreduce/spark to distribute your app using that as a template (spark
streaming can )




On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. <bdgeise@geisinger.edu>
wrote:

> Thanks Sean.  I'll take a look and see if this speeds the pipeline up.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Friday, December 05, 2014 1:14 PM
> To: dev@ctakes.apache.org
> Subject: RE: Scaling cTakes
>
> Hi Brandon,
>
> It sounds like you've got  a decent pipeline set up.  To increase the
> speed you could try swapping out use of ctakes-dictionary-lookup with
> ctakes-dictionary-lookup-fast in the AE.  Check
> ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml
> for an example.  As for the CASPool, I don't think that it will make any
> difference for cTakes.
>
> Sean
> ________________________________________
> From: Geise, Brandon D. [bdgeise@geisinger.edu]
> Sent: Friday, December 05, 2014 12:40 PM
> To: dev@ctakes.apache.org
> Subject: Scaling cTakes
>
> Hi,
>
> I'm new to cTakes and the UIMA framework.  I've read most of the UIMA
> documentation and was able to take the BagofCUIGenerator example and modify
> to read notes from a DB, process using the UMLS AE in the clinical-pipeline
> using a local DB version of UMLS, and output the CUIs to a DB.  However,
> the problem I'm having is it's extremely slow; ~3.5-4 notes a minute.  I
> was hoping I could get some hints or advice on speeding the process up.  I
> read there's a patch for LVG, but wasn't quite sure how to implement.  Also
> from testing using the CPE GUI, I don't notice any different in processing
> time by adjusting the CASPool setting.  Some advice on the CASPool would be
> appreciated also.
>
> Thanks,
> Brandon
>
>
> IMPORTANT WARNING: The information in this message (and the documents
> attached to it, if any) is confidential and may be legally privileged. It
> is intended solely for the addressee. Access to this message by anyone else
> is unauthorized. If you are not the intended recipient, any disclosure,
> copying, distribution or any action taken, or omitted to be taken, in
> reliance on it is prohibited and may be unlawful. If you have received this
> message in error, please delete all electronic copies of this message (and
> the documents attached to it, if any), destroy any hard copies you may have
> created and notify me immediately by replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard
> Protected Health Information and other confidential data contained in
> external e-mail messages. If email is encrypted, the recipient will receive
> an e-mail instructing them to sign on to the Geisinger Health System Secure
> E-mail Message Center to retrieve the encrypted e-mail.
>
>


-- 
jay vyas

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message