commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rory Winston" <>
Subject RE: [chain] Pipeline implementation
Date Wed, 22 Sep 2004 08:59:21 GMT
Oops. Looks like my line length has messed up my ASCII art :)

The actual pipeline should look like this:

 *Connector A (FTP) ------> Pipeline Stage A (transform line data to
XML) ----------> Pipeline Stage B (transform XML to PDF) ------> ConnectorB

-----Original Message-----
From: Rory Winston []
Sent: 22 September 2004 09:53
To: Jakarta Commons Developers List
Subject: RE: [chain] Pipeline implementation

I've been following bits of this thread (will re-read the whole thread later
when I have time), and I am fascinated with this approach. I have been
mulling over a similar implementation for a new framework, which comes from
real-world requirements in my work. The basic idea will be a pipeline, which
may have many inputs, and many outputs. The outputs are configurable, and
could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
one end of the pipeline via a connector, and passes through each stage in
the pipeline. Each individual stage takes an input and produces an output,
and the "work" done in each pipeline stage can be performed via a plugin, of
sorts. A hook will be provided in each stage so that a developer could
insert his/her custom code. Here is a simple diagram of what I have in mind.
This is a trivial plugin that reads line data files from an FTP connector,
transforms the line data to XML via  predefined transform, then passes
through another pipeline stage that transforms the XML files to PDF. The
final stage is another connector that stores the generated PDF files in a

     .--------------------.               .---------------------.
.---------------------.    .-------------.
     |                    |               |                     |
|                     |    |             |
     |                    |               |                     |
|                     |    |             |
     |   Conn. A          |-------------- | Pipeline Stage A
|------------------| Pipeline Stage B    |----|  Conn. B    |
     |    (FTP)           |      .        |                     |
|                     |    |     (JDBC)  |
     |                    |      |        |                     |
|                     |    |             |
     '--------------------'      |        '---------------------'
'---------------------'    '-------------'
                                 |                       .
.                      .
                                 |                       |
|                      |
                                 |                       |
|                      |
                                 |                       |
Transform XMl          Write PDF
                               Read files              Transform line data
to PDF                 to DB
                                                               to XML

Thinking about this problem, there are a few things that are needed:

 - A pipeline-based processing model
 - A generic connector API
 - Event-driven async. pipeline processing

A workflow specification (possibly XML) could define the pipeline "flow",
and this could be generated via a GUI. The WebMethods Integration platform
is the slickest example of this approach that I have seen.

My questions are thus : could Commons-Chain (+ pipeline) be used as the
basis for this type of processing? Are there any other open-source
frameworks that anyone knows of that do this already?


-----Original Message-----
From: Kris Nuttycombe []
Sent: 22 September 2004 00:47
To: Jakarta Commons Developers List
Subject: Re: [chain] Pipeline implementation

Alex Karasulu wrote:

>>subscribers that share the same index have events processed in parallel.
>>Also, perhaps instead of returning void StageHandler.handleEvent() could
>>return a boolean value that flags whether or not the event is allowed to
>>propagate to other stages with higher serial numbers.
>That's also another good idea.  This almost reminds me of rule salience
>in expert system shells.  What stage does the event have the most
>affinity for?
I hadn't thought of things in this context, but both the stage/event
handling pieces of the SEDA framework and the pipeline we've developed
here do seem a lot like frameworks for building specialized expert
systems with concurrent processing.

>>but then it seems like you have bleeding of the application logic into
>>the configuration realm. Maybe one could modify the StageHandler
>>interface by adding a method that allows you to query for the runtime
>>class of the event returned to get around this problem.
>I don't understand the "bleeding of the application logic" comment.
>Could you clarify this some more and explain how this is removed when
>the class of the event can be queried?
StageHandler's handleEvent() method is regularly responsible for raising
events and pass them back to the event router, right? The problem is
that there's nothing in the public API that makes it clear what events a
particluar StageHandler may generate, so establishing a routing scheme
is a manual process that involves the programmer having knowledge of the
StageHandler's internals. In a situation where you're trying to set up a
linear routing scheme from a configuration file, it would make more
sense that the ordering of elements in that file would determine the
routing. If it's possible for a configuration tool to look at a
StageHandler and determine what events the handleEvent method has the
potential to raise, then automatic configuration becomes much simpler.
It might also be useful to define a method on the interface that allows
a handler to announce what events it can handle.

>>We do things like this all the time, but I'm beginning to see how we
>>could get around it by having a base event type that related stages all
>>process and have each stage raise a subtype of that event. Seems a bit
>>like going the long way around the horn for our use case, but it might
>>add enough value to be worth it.
>Well this way may not be the best way for you.  This is our first
>attempt using the pub/sub pattern.  Questions about subtyping verses
>other means have been discussed.  Right now we simply don't know which
>way is the best way.

I think that the pub/sub model definitely has the potential to be a lot
more powerful than our current approach; it's just a matter of
developing the interfaces to make them flexible enough to support use
cases for both projects. I think that our use cases are different enough
that if we can find a model that satisfies both it will be a broadly
useful framework.

Initially Craig had suggested setting up a commons-pipeline project in
the sandbox. I've been preparing our code (licenses, submission
agreements, etc) to make this transition. Are you at all interested in
refactoring out the stage, event routing, and thread handling pieces
from the network-oriented bits of SEDA into this project? There are
definitely parts of your code that I'd like to be able to use without
forking them, although I'm sure you don't really want to introduce
extraneous dependencies.


Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message