ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanna Bala <balkiprasanna1...@gmail.com>
Subject Re: cTakes Scalability Problem
Date Tue, 01 Jul 2014 15:50:18 GMT

Thanks for your suggestions. So I have to change the "int" to "bigint" to
improve the performance.

I am looking at UIMA DUCC.

The problem with Hadoop is it runs in batch process. So it cannot be used
for low latency real systems. But still I want to explore it.

On Tue, Jul 1, 2014 at 6:20 PM, Jonathan Bates <jonrbates@gmail.com> wrote:

> Hi Prasanna,
> I am currently using 3.1.2 to process ~40M notes using 14 CPEs with
> AggregatePlaintextUMLSProcessor+DBConsumer.  So far, ~34M notes have been
> annotated and stored.  Altogether, I'm seeing 0.054sec/note.  This is with
> 4.1k rows in v_snomed_fword_lookup.  One modification we had to make was to
> change anno_base_id datatype from 'int' to 'bigint'.  It would be very
> interesting to see Hadoop used with ctakes...
> -Jon
> On Tue, Jul 1, 2014 at 1:54 AM, Prasanna Bala <balkiprasanna1984@gmail.com
> > wrote:
>> Hi,
>> I have certain clarifications. This is regarding using third party
>> libraries with cTakes. I have clarifications on run time for processing
>> documents using cTakes. We are able to run the cTakes through batch mode.
>> But we have plans to run documents for 1 million clinical documents. Can
>> anyone tell me if they have tackled scalability using cTakes ? I have an
>> idea to distribute the process using Hadoop. There are various libraries
>> available that can use UIMA and distribute the process using Hadoop. Since
>> cTakes is also developed using UIMA, I think there should be a way to
>> distribute process. Have anyone tried this ? Are there any limitations in
>> distributing problems using cTakes ? Your thoughts please ?
>> Regards,
>> Prasanna

View raw message