uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Holmberg" <holmberg2...@comcast.net>
Subject Re: Scaling using Hadoop
Date Wed, 05 Oct 2011 18:24:33 GMT
On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <twgoetz@gmx.de> wrote:

> On 26/09/11 22:31, Greg Holmberg wrote:
>>
>> This is what I'm doing.  I use JavaSpaces (producer/consumer queue),  
>> but I'm
>> sure you can get the same effect with UIMA AS and ActiveMQ.
>
> Or Hadoop.

Thilo, could you expand on this?  Exactly how do you use Hadoop to scale  
UIMA?

What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is  
your final storage destination for the CAS data?

Are you doing on-demand, streaming, or batch processing of documents?

What are your key/value pairs?  URLs?  What's your map step, what's your  
reduce step?

How do you partition?  Do you find the system is load balanced?  What  
level of efficiency do you get?  What level of CPU utilization?

Do you do just document (UIMA) analysis in Hadoop, or also collection  
(multi-doc) analytics?

The fit between UIMA and Hadoop isn't obvious to me.  Just trying to  
figure it out.

Thanks,


Greg Holmberg

Mime
View raw message