ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Bates <jonrba...@gmail.com>
Subject Re: cTakes Scalability Problem
Date Tue, 01 Jul 2014 12:50:44 GMT
Hi Prasanna,
I am currently using 3.1.2 to process ~40M notes using 14 CPEs with
AggregatePlaintextUMLSProcessor+DBConsumer.  So far, ~34M notes have been
annotated and stored.  Altogether, I'm seeing 0.054sec/note.  This is with
4.1k rows in v_snomed_fword_lookup.  One modification we had to make was to
change anno_base_id datatype from 'int' to 'bigint'.  It would be very
interesting to see Hadoop used with ctakes...

On Tue, Jul 1, 2014 at 1:54 AM, Prasanna Bala <balkiprasanna1984@gmail.com>

> Hi,
> I have certain clarifications. This is regarding using third party
> libraries with cTakes. I have clarifications on run time for processing
> documents using cTakes. We are able to run the cTakes through batch mode.
> But we have plans to run documents for 1 million clinical documents. Can
> anyone tell me if they have tackled scalability using cTakes ? I have an
> idea to distribute the process using Hadoop. There are various libraries
> available that can use UIMA and distribute the process using Hadoop. Since
> cTakes is also developed using UIMA, I think there should be a way to
> distribute process. Have anyone tried this ? Are there any limitations in
> distributing problems using cTakes ? Your thoughts please ?
> Regards,
> Prasanna

View raw message