uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: UIMA analysis from a database
Date Fri, 15 Sep 2017 19:15:36 GMT
If you have a Hadoop/Spark/YARN cluster, why would you use UIMA-AS?

Afaik UIMA-AS is usually used to run UIMA components as statically
deployed services that communicate with each other via a message queue.

I suppose in a Hadoop/Spark/YARN cluster you'd care more about dynamic
deployment and instead of a message queue I suppose you'd use RDDs, no?


-- Richard

On 15.09.2017, at 20:54, Fox, David <david.fox@optum.com> wrote:
> We¹re looking to transition a NLP large application processing ~30TB/month
> from a custom NLP framework to UIMA-AS, and from parallel processing on a
> dedicated cluster with custom python scripts which call gnu parallel, to
> something with better support for managing resources on a shared cluster.
> Both our internal IT/engineering group and our cluster vendor
> (HortonWorks) use and support Hadoop/Spark/YARN on a new shared cluster.
> DUCC¹s capabilities seem to overlap with these more general purpose tools.
> Although it may be more closely aligned with UIMA for a dedicated
> cluster, I think the big question for us would be how/whether it would
> play nicely with other Hadoop/Spark/YARN jobs on the shared cluster.
> We¹re also likely to move at least some of our workload to a cloud
> computing host, and it seems like Hadoop/Spark are much more likely to be
> supported there.

View raw message