uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Spico Florin <spicoflo...@gmail.com>
Subject Administration of UIMA AS pipeline challenge. Please advice. Architectural aspects
Date Mon, 26 Mar 2012 20:06:12 GMT
  Currently I'm working on a project that:
1. collects news from Internet and
2. passes news  to UIMA pipeline should identify entities
(organizations, locations etc) Here we are using 4 annotators
3. determines the accuracy of a given subject category  ( using
prediction models) 9 annotators and the number will increase
4. should support more different type of annotators in the pipelines
5. when one analyze engine  fails with an exception or is dead, the
entire annotation process ( all the pipelines) should be stopped.
6. all of the remote analysis engines are located located on the same
machine together with parallel flow controller (one server machine
used for all annotators).
  Based on the above I've created 2 pipelines  that works in parallel
(listening an JMS topic)
that were built having in mind separation of concerning subject. One
pipeline is for entity recognition and the other one for prediction.
Both of them are using Aggregate Analysis Engine based on Parallel
Flow Controller. In order to benefit from the feature of parallel
annotation process, both pipelines are built using remote clients
(approximately a remote client per annotator).
  Given the above scenario, the following concerns and questions are raised:
1. Is my approach correct regarding the requirements? Is there a
better way to design the pipeline (perhaps in one single pipeline)?
2. Having so many remote analysis engine started (via scripts using
deplyAsyncService), I found difficult to manage them,especially
concerning the 5th requirement. Is there any support for
monitoring(view alive statuses) and operating (such as start, stop) on
these services (besides JMX jconsole)? Perhaps you can re command a
good application Nagios??)
3. Regarding the 5th requirement, I've observed that using the
allowContinueOnFailure feature on parallel flow stops the pipeline
processing when one component is down but only in case when AEs are
collocated. What about remote analysis? Is there any way to trap that
one AE is down and thus alerting all the other AE process (remote AE)
of the pipeline to be killed.
     Based on the above, I would appreciate any help and any advices
or suggestions from the UIMA community.
 Thank you for your patience ans support.

View raw message