commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Tanaka <>
Subject Re: [PIPELINE] Questions about pipeline
Date Mon, 27 Oct 2008 19:44:48 GMT
Hi Tim,

Tim Dudgeon wrote:
> Hi Ken,
> Thanks for the rapid response.
> First, let me explain some background here.
> I am looking for Java based pipelining solutions to incorporate into 
> an exisiting application. The use of pipelining is well established in 
> the  sector, with applications like Pipeline Pilot and Knime, and so 
> many of the common needs have been well established over several years 
> by these applciations.
Have you also looked at Pentaho?
> Key issues that my initial investigations of Jakarta Pipeline seem to 
> identify are:
> 1. Branching is very common. This typically takes 2 forms:
> 1.1. Splitting data. A stage could (for instance) have 2 output ports, 
> "pass" and "fail". Data is processed by the stage and sent to 
> whichever port is appropriate. Different stages would be attached to 
> each port, resulting in the pipeline being brached by this pass/fail 
> decision.
> 1.2. Attaching multiple stages to a particular output port.
> The stage just sends its output onwards. It has no interest in what 
> happens once the data is sent, and is not concerned whether zero, one 
> or  100 stages receive the output. This is the stage1,2,3,4 scenario I 
> outlined previously.
> 2. Merging is also common (though less common than branching).
> By analogy with braching, I would see this conceptually as a stage 
> having multiple input ports (A and B in the merging example).
At present, the structure for storing stages is a linked list, and 
branches are implemented as additional pipelines accessed by a name 
through a HashMap. To generally handle branching and merging, a directed 
acyclic graph (DAG) would better serve, but that would require the 
pipeline code to be rewritten at this level. Arguments could also be 
made for allowing cycles, as in directed graphs, but that would be 
harder to debug, and with a GUI might be a step toward a visual 
programming language--so I don't think this should be pursued yet unless 
there are volunteers...

> Taken together I can see a generalisation here using named ports 
> (input and outut), which is similar, but not identical, to your 
> current concept of branches.
> So you have:
> BaseStage.emit(String branch, Object obj);
> whereas I would conceptually see this as:
> emit(String port, Object obj);
> and you have:
> Stage.process(Object obj);
> whereas I would would conceptually see this as:
> Stage.process(String port, Object obj);
> And when a pipeline is being assembled a downstream stage is attached 
> to a particular port of a stage, not the stage itself. It then just 
> recieves data sent to that particular port, but not the other ports.
I could see that this would work, but would need either modifying a 
number of stages already written, or maybe creating a compatibility 
stage driver that takes older style stages so that the input object 
comes from a configured port name, usually "input" and a sends the 
output to  configured output ports named "output" and whatever the 
previous branch name(s) were, if any. Stages that used to look for 
events for input should be rewritten to read multiple inputs ( 
Stage.process(String port, Object obj) as you suggested). Events would 
then be reserved for truly out-of-band signals between stages rather 
than carrying data for processing.
> I'd love to hear how compatible the current system is with this way of 
> seeing things. Are we just talking about a new type of Stage 
> implementation, or a more fundamental incompatibility at the API level.
I think you have some good ideas. This is changing the Stage 
implementation, which affects on the order of 60 stages for us that 
override the process method, unless the compatibility stage driver works 
out. The top level pipeline would also be restructured. The amount of 
work required puts this out of the near term for me to work on it, but 
there may be other developers/contributors to take this on.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message