Return-Path: Delivered-To: apmail-commons-user-archive@www.apache.org Received: (qmail 46302 invoked from network); 28 Oct 2008 12:13:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Oct 2008 12:13:17 -0000 Received: (qmail 12277 invoked by uid 500); 28 Oct 2008 12:13:20 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 11465 invoked by uid 500); 28 Oct 2008 12:13:18 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 11454 invoked by uid 99); 28 Oct 2008 12:13:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2008 05:13:18 -0700 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gcjcu-commons-user@m.gmane.org designates 80.91.229.2 as permitted sender) Received: from [80.91.229.2] (HELO ciao.gmane.org) (80.91.229.2) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Oct 2008 12:12:04 +0000 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1KunRA-0007rY-SN for user@commons.apache.org; Tue, 28 Oct 2008 12:12:40 +0000 Received: from host81-153-156-255.range81-153.btcentralplus.com ([81.153.156.255]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 28 Oct 2008 12:12:40 +0000 Received: from tdudgeon by host81-153-156-255.range81-153.btcentralplus.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 28 Oct 2008 12:12:40 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: user@commons.apache.org From: Tim Dudgeon Subject: Re: [PIPELINE] Questions about pipeline Date: Tue, 28 Oct 2008 12:10:36 +0000 Lines: 114 Message-ID: References: <48F62AF9.1010708@noaa.gov> <49021530.3@noaa.gov> <49061A30.8070001@noaa.gov> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: host81-153-156-255.range81-153.btcentralplus.com User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) In-Reply-To: <49061A30.8070001@noaa.gov> Sender: news X-Virus-Checked: Checked by ClamAV on apache.org Ken Tanaka wrote: > Hi Tim, > > Tim Dudgeon wrote: >> Hi Ken, >> >> Thanks for the rapid response. >> First, let me explain some background here. >> I am looking for Java based pipelining solutions to incorporate into >> an exisiting application. The use of pipelining is well established in >> the sector, with applications like Pipeline Pilot and Knime, and so >> many of the common needs have been well established over several years >> by these applciations. > Have you also looked at Pentaho? I took a look, but it doesn't seem to be what I'm after. >> >> Key issues that my initial investigations of Jakarta Pipeline seem to >> identify are: >> >> 1. Branching is very common. This typically takes 2 forms: >> 1.1. Splitting data. A stage could (for instance) have 2 output ports, >> "pass" and "fail". Data is processed by the stage and sent to >> whichever port is appropriate. Different stages would be attached to >> each port, resulting in the pipeline being brached by this pass/fail >> decision. >> 1.2. Attaching multiple stages to a particular output port. >> The stage just sends its output onwards. It has no interest in what >> happens once the data is sent, and is not concerned whether zero, one >> or 100 stages receive the output. This is the stage1,2,3,4 scenario I >> outlined previously. >> >> 2. Merging is also common (though less common than branching). >> By analogy with braching, I would see this conceptually as a stage >> having multiple input ports (A and B in the merging example). >> > At present, the structure for storing stages is a linked list, and > branches are implemented as additional pipelines accessed by a name > through a HashMap. To generally handle branching and merging, a directed > acyclic graph (DAG) would better serve, but that would require the > pipeline code to be rewritten at this level. Arguments could also be > made for allowing cycles, as in directed graphs, but that would be > harder to debug, and with a GUI might be a step toward a visual > programming language--so I don't think this should be pursued yet unless > there are volunteers... > I agree, DAG would be better, but cycles could be needeed too, so DG would be better too. But, yes, I am ideally wanting visual designer too. >> >> Taken together I can see a generalisation here using named ports >> (input and outut), which is similar, but not identical, to your >> current concept of branches. >> >> So you have: >> BaseStage.emit(String branch, Object obj); >> whereas I would conceptually see this as: >> emit(String port, Object obj); >> and you have: >> Stage.process(Object obj); >> whereas I would would conceptually see this as: >> Stage.process(String port, Object obj); >> >> And when a pipeline is being assembled a downstream stage is attached >> to a particular port of a stage, not the stage itself. It then just >> recieves data sent to that particular port, but not the other ports. > I could see that this would work, but would need either modifying a > number of stages already written, or maybe creating a compatibility > stage driver that takes older style stages so that the input object > comes from a configured port name, usually "input" and a sends the > output to configured output ports named "output" and whatever the > previous branch name(s) were, if any. Stages that used to look for > events for input should be rewritten to read multiple inputs ( > Stage.process(String port, Object obj) as you suggested). Events would > then be reserved for truly out-of-band signals between stages rather > than carrying data for processing. Agreed, I think with would be good. I think existing stages could be made compatible by having a default input and output port, and to use those if not specific port was specified. A default in/out port would probably be necessary to allow simple auto-wiring. >> >> I'd love to hear how compatible the current system is with this way of >> seeing things. Are we just talking about a new type of Stage >> implementation, or a more fundamental incompatibility at the API level. >> > I think you have some good ideas. This is changing the Stage > implementation, which affects on the order of 60 stages for us that > override the process method, unless the compatibility stage driver works > out. The top level pipeline would also be restructured. The amount of > work required puts this out of the near term for me to work on it, but > there may be other developers/contributors to take this on. I need to investigate more fully here, and consider the other options. But potentially this is certainly of interest. So is all that's necessary to prototype this to create a new Stage implementation, with new emit( ... ) and process( ... ) methods? Thanks Tim > > -Ken --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org