commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kris Nuttycombe" <>
Subject Re: [chain] Pipeline implementation
Date Fri, 17 Sep 2004 18:10:36 GMT
Hi, Craig,

I agree that the second approach seems to be the most appropriate and 
flexible. From our current architecture it should be trivial to 
implement an adapter from a pipeline stage that uses a chain to 
determine the processing logic for a thread.

How should I proceed as far as setting up the sandbox project goes? For 
starters I can send you the current codebase and maven project files.

Thanks for your interest!


Craig McClanahan wrote:

>Hello Kris,
>The pipeline support you describe does indeed sound interesting, and
>especially effective in application environments where multithreaded
>processing support is appropriate.  In general, that has tended not to
>be the case in  the area that motivated creation of [chain] -- web
>applications (for example, you should not be accessing a single
>HttpServletRequest instance from more than one thread), so I didn't
>think about multiple thread support when creating the original [chain]
>Assuming I understand what you describe well enough, I also agree that
>there could be substantial overlap between what you describe and
>[chain] as it currently exists -- to the point that sharing common
>infrastructure would seem to make a lot of sense.  Indeed, it seems
>like the common concept would be how to specify what happens "in
>between" the blocking queues, and it's the queue and thread management
>that is unique.  Does that sound right?
>If so, at least three ways we could proceed:
>* Add a separate Commons package ([pipeline]?) for the multithread
>queue management
>  stuff, which depends on [chain] for the implementation of what
>happens on a thread
>* Create [pipeline] as above, but have it define it's own interface for how you
>  plug in the processing logic for a thread, and then have [chain] or [pipeline]
>  provide an adapter so that you can use a chain to specify the
>processing logic.
>* Add a layer in [chain] to provide the multithread queue management stuff
>  as part of the same package, but not required to use a chain in a
>single thread.
>All three approaches seem viable, but based on my current
>understanding of what you are describing it seems like the second one
>might be the best.  The queue management sounds like something that is
>generally useful in its own right, no matter how you choose to
>implement the actual processing logic.  I'd be happy to further
>discuss the overall approach too -- I'm not married to any particular
>If that sounds like a good idea to you, I'd be happy to work with you
>to create a Commons Sandbox package to let us share and experiment
>with the code.  I can do the commits until you or others on your team
>got voted to be committers on Jakarta Commons.
>Would this sort of approach be of interest?
>Craig McClanahan
>On Fri, 17 Sep 2004 11:28:07 -0600, Kris Nuttycombe
><> wrote:
>>Hi, all,
>>I'm writing to get some advice and perhaps offer some code that may be
>>useful to the commons-chain project or elsewhere.
>>The group I work for does a large amount of data processing and we are
>>working on solutions for pipelined data processing. Our current
>>implementation uses a pipeline model where each stage in the pipeline
>>has an integrated blocking queue and an abstract process(Object o)
>>method that is sequentially applied to each element in the queue. When a
>>stage is finished processing, it may pass the processed object (or
>>products derived from it) onto the input queue of one or more subsequent
>>stages in the pipeline. Branching pipelines are supported, and the whole
>>mess is configured using Digester.
>>There's a lot of similarity here with the chain of responsibility
>>pattern that commons-chain implements, but subtle differences as well.
>>Each stage runs in one or more separate threads and we are working to
>>allow the processing to be distributed across the network. The pipeline
>>model assumes that each object placed in the pipe is going to be
>>processed by every stage, whereas to my understanding the chain of
>>responsibility is more designed for finding an appropriate command to
>>use to process a given context. Also, the pipeline is designed to run as
>>a service where data can be provided for processing by automated
>>systems. For example, data being beamed down from a satellite can be
>>aggregated into orbits that are then passed into the pipeline for
>>generation of geolocated gridded products, statistical analysis, etc.
>>Our group would really like to be able to contribute some of this code
>>back to the commons effort, since we use a ton of commons components.
>>The amount of overlap with commons-chain is significant, but I'm not
>>sure it's a perfect match because of the differing goals. Does anyone
>>out there know of other similar efforts? Is there a place for this sort
>>of code in commons? Are we just missing something fundamental about
>>commons-chain where we should simply be using that instead?
>>Suggestions would be much appreciated. I'm happy to send code, examples,
>>and documentation to anyone who's interested.
>>Kris Nuttycombe
>>Associate Scientist
>>Enterprise Data Systems Group
>>CIRES, National Geophysical Data Center/NOAA
>>(303) 497-6337
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message