uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fox, David" <david....@optum.com>
Subject Re: UIMA analysis from a database
Date Fri, 15 Sep 2017 20:05:31 GMT
Still pretty new to this, but our pipeline will likely include at least
one annotator written in C++ using the UIMA C++ API.  My understanding
(from https://uima.apache.org/doc-uimacpp-huh.html) was that there are
issues (particularly “2. Runtime problems in the C++ code can crash the
entire JVM process.”) with invoking a C++ annotator from a JVM process via
JNI.  We were hoping to avoid that with UIMA-AS, but my understanding of
both UIMA-AS and Hadoop is limited, so you’re question may very well be a
good one.

David






On 9/15/17, 3:15 PM, "Richard Eckart de Castilho" <rec@apache.org> wrote:

>If you have a Hadoop/Spark/YARN cluster, why would you use UIMA-AS?
>
>Afaik UIMA-AS is usually used to run UIMA components as statically
>deployed services that communicate with each other via a message queue.
>
>I suppose in a Hadoop/Spark/YARN cluster you'd care more about dynamic
>deployment and instead of a message queue I suppose you'd use RDDs, no?
>
>Cheers,
>
>-- Richard
>
>On 15.09.2017, at 20:54, Fox, David <david.fox@optum.com> wrote:
>> 
>> We¹re looking to transition a NLP large application processing
>>~30TB/month
>> from a custom NLP framework to UIMA-AS, and from parallel processing on
>>a
>> dedicated cluster with custom python scripts which call gnu parallel, to
>> something with better support for managing resources on a shared
>>cluster.
>> 
>> Both our internal IT/engineering group and our cluster vendor
>> (HortonWorks) use and support Hadoop/Spark/YARN on a new shared cluster.
>> DUCC¹s capabilities seem to overlap with these more general purpose
>>tools.
>> Although it may be more closely aligned with UIMA for a dedicated
>> cluster, I think the big question for us would be how/whether it would
>> play nicely with other Hadoop/Spark/YARN jobs on the shared cluster.
>> We¹re also likely to move at least some of our workload to a cloud
>> computing host, and it seems like Hadoop/Spark are much more likely to
>>be
>> supported there.
>
>

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Mime
View raw message