uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: P2P UIMA
Date Mon, 05 Jan 2009 20:56:15 GMT

Yosi Mass wrote:
> Hi,
> I would like to suggest a scale-out of UIMA by enabling it to run in a P2P
> environment.
> >From my understanding, the CPE is a 1st generation scaleout, and it can run
> a distributed pipeline using vinci/soap but the machines involved in the
> pipeline are predefined in the UIMA descriptors.
> The 2nd generation scaleout is called UIMA-AS (AS = Asynchronous Scaleout),
> and is based on some Java and web standards, such as JMS (Java Messaging
> Service).  It is now officially released on Apache UIMA.  This allows users
> to selectively choose which parts of their pipeline to run in this mode,
> which in turn allows scaling out individual parts of the pipeline, as
> needed. Again there is no dynamic discovery of resources after startup.
Hmm, I think this may not be quite accurate.  In UIMA-AS, connections
are made using a JMS infrastructure, such as ActiveMQ.  Each service has
an associated "address" in the network space, made up of a Broker URL
and Port.

The actual service implementation is done by 1 or more servers that
register themselves with the Broker URL and Port.  During a run, servers
can be dynamically added or removed; this changes the "capacity" of the
service.  Of course, if all of the servers for a particular service are
removed, then the service "fails". 

But maybe what is meant, is, rather, the ability of the system to
recognize when a service becomes available, rather than merely changing
its capacity.  For instance, in the UIMA-AS case, this could mean
several kinds of things:

1) allowing a service to be configured with 0 servers available at startup

2) having the flow controller "know" more explicitly about service
"availablilty", for instance, the number of "servers" there might be for
a particular service.  Here, the idea would be that a flow controller
could dynamically decide, based on what the service level of different
steps in the pipeline were, how to "route" a CAS for a particular aggregate.

Are these the kinds of function that are desired?
> I would like to suggest a 3rd generation scaleout using a fully
> decentralized P2P network. Assume that each peer can publish its
> capabilities (namely which annotators it can run) and its current
> availability, then we may extend UIMA/UIMA-AS pipeline to discover an
> available and capable peer for running an annotator and thus achieve better
> load balancing and thus better performance than previous generations.
The "publication" would need to include the type system of the
annotators, and some notion of which annotators would ever "want" to be
run together in a pipeline, because a key part of the UIMA design is the
"merging" of type systems to allow interoperability among the parts.

Is there a "reservation" idea here too?  For instance, in an open
environment, where there are lots of clients and services and servers
for those services, a particular client might want to reserve some
amount of processing capability for itself, (not necessarily all of the

Finally, I wonder -- are there systems / infrastructure / middleware
already out there that do this kind of thing that we could perhaps
easily adapt / adopt for these purposes?

> What people on the list think about this?
> Thanks, Yosi

View raw message