commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kris Nuttycombe" <>
Subject Re: [chain] Pipeline implementation
Date Fri, 17 Sep 2004 23:29:19 GMT

Alex Karasulu wrote:

>Cool.  We actually don't have our own thread pool but abstracted it away
>so we can use any thread pool.  For testing purposes we use the commons
>sandbox thread pool.  We made this API thinking it'll be embedded within
>other servers or frameworks that already have their own thread pool
>implementation.  We just adapt other thread pools to our ThreadPool
>interface using a wrapper.  This might be something you might want to do
>too when you go multithreaded.  Helps not have to carry around runtime
>depenencies to other projects. 

That sounds like a reasonable route to take. This abstract wrapper 
actually sounds like something that might be appropriate for another 
commons component, sort of like commons-logging does for logging APIs.

>>The model that we've been using is that stages process data instead of 
>>events, although one could certainly consider an event as a type of 
>Ahh that's interesting.  We do the events and carry a payload.  One of
>the benefits we get with using an EventObject derived event is a nice
>event type heirarchy that can be used to filter when routing events. 
>Also we can associate other peices of information with the data to
>control how it is processed.  One of the things we're working on in
>particular where this is coming in handy is for synchronization within
>the staged pipeline.  There's much we have to do here though.  I'm
>toying with implementing rendezvous points for events and using other
>constructs for better processing control of entire pipelines.
How is routing handled if you have multiple subscribers registered that 
can handle the same type of event? Is it possible under your framework 
to explicitly specify the notification sequence such that an event is 
handled by one subscriber which raises the same variety of event to be 
handled by the next subscriber? I ask because one of the main things we 
do with our implementation is sequential processing where the input data 
may be valid for a number of stages, but the order of the stages is 
critical for correct results to be produced. For example, we do a lot of 
processing of spatial data. A reader stage may generate a sequence of 
geometry objects from a datafile which are enqueued on a subsequent 
stage. We may apply line generalization and polygon splitting algorithms 
to these data sequentially, and the resulting values are quite different 
depending upon which order these filters are specified in. The stages 
that wrap these algorithms are generic; the only information that they 
have is that they should expect the input queue to contain geometries, 
or perhaps beans that have one or more properties that are geometries. 
The execution order is defined externally in a configuration file. We 
don't suffer from strong coupling between stages because we give up on 
compile-time type safety and leave it up to the stage to determine what 
to do if it's fed an object of an unexpected type.

>>data.  Our stages have to be aware of the  pipelines in which they 
>>reside because our Stage interface defines additional "exqueue(Object 
>>obj)" and "exqueue(String key, Object obj)" methods which are used to 
>>enqueue an object on either a subsequent stage or a keyed pipeline 
>>branch, respectively.
>I had the same problem which created a high degree of coupling between
>stages.  Since stages were implemented in IoC frameworks sometimes there
>were complaints in complex systems where cycles were introduced.  I
>started using a simple pub/sub event router/hub to decouple theses
>stages.  Ohhh looks like you ask about that below...

The way that we avoided problems with cycles was pretty simple; we don't 
allow a stage to exist in more than a single pipeline simultaneously. 
Stages are pretty lightweight so it's not a hassle to create multiple 
stages of a single type if that's what's needed. If there's an explicit 
need to create a cycle, it's trivial to write a stage that injects data 
it receives elsewhere in the pipeline and you treat it like recursion, 
making sure that the exit condition is well specified somewhere within 
the cycle.

>We have a service we've defined called the EventRouter along with
>Subscriber's and Subscriptions.  It's like the core dependency; instead
>of having every stage depend on others downstream we make each stage
>dependent on the EventRouter.  Basically this forms a hub and spoke like
>dependency relationship between the stages and the event
>broker/router/hub whatever you like to call it.  Now we can dynamically
>register new Subscriptions with it to route events to different stages. 
>We use the event router to handle configuration events while tying
>together the pipleline as well as for the inband processing of data
>flowing through the system.
So is the usual use case that a StageHandler will handle one event and 
generate another that is referred back to the router for handling? 
Doesn't this cause problems if your StageHandler raises the same variety 
of event that caused it to be invoked in the first place?


Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message