commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig McClanahan <craig...@gmail.com>
Subject Re: [chain] Pipeline implementation
Date Fri, 17 Sep 2004 17:57:00 GMT
Hello Kris,

The pipeline support you describe does indeed sound interesting, and
especially effective in application environments where multithreaded
processing support is appropriate.  In general, that has tended not to
be the case in  the area that motivated creation of [chain] -- web
applications (for example, you should not be accessing a single
HttpServletRequest instance from more than one thread), so I didn't
think about multiple thread support when creating the original [chain]
architecture.

Assuming I understand what you describe well enough, I also agree that
there could be substantial overlap between what you describe and
[chain] as it currently exists -- to the point that sharing common
infrastructure would seem to make a lot of sense.  Indeed, it seems
like the common concept would be how to specify what happens "in
between" the blocking queues, and it's the queue and thread management
that is unique.  Does that sound right?

If so, at least three ways we could proceed:

* Add a separate Commons package ([pipeline]?) for the multithread
queue management
  stuff, which depends on [chain] for the implementation of what
happens on a thread

* Create [pipeline] as above, but have it define it's own interface for how you
  plug in the processing logic for a thread, and then have [chain] or [pipeline]
  provide an adapter so that you can use a chain to specify the
processing logic.

* Add a layer in [chain] to provide the multithread queue management stuff
  as part of the same package, but not required to use a chain in a
single thread.

All three approaches seem viable, but based on my current
understanding of what you are describing it seems like the second one
might be the best.  The queue management sounds like something that is
generally useful in its own right, no matter how you choose to
implement the actual processing logic.  I'd be happy to further
discuss the overall approach too -- I'm not married to any particular
approach.

If that sounds like a good idea to you, I'd be happy to work with you
to create a Commons Sandbox package to let us share and experiment
with the code.  I can do the commits until you or others on your team
got voted to be committers on Jakarta Commons.

Would this sort of approach be of interest?

Craig McClanahan


On Fri, 17 Sep 2004 11:28:07 -0600, Kris Nuttycombe
<kris.nuttycombe@noaa.gov> wrote:
> Hi, all,
> 
> I'm writing to get some advice and perhaps offer some code that may be
> useful to the commons-chain project or elsewhere.
> 
> The group I work for does a large amount of data processing and we are
> working on solutions for pipelined data processing. Our current
> implementation uses a pipeline model where each stage in the pipeline
> has an integrated blocking queue and an abstract process(Object o)
> method that is sequentially applied to each element in the queue. When a
> stage is finished processing, it may pass the processed object (or
> products derived from it) onto the input queue of one or more subsequent
> stages in the pipeline. Branching pipelines are supported, and the whole
> mess is configured using Digester.
> 
> There's a lot of similarity here with the chain of responsibility
> pattern that commons-chain implements, but subtle differences as well.
> Each stage runs in one or more separate threads and we are working to
> allow the processing to be distributed across the network. The pipeline
> model assumes that each object placed in the pipe is going to be
> processed by every stage, whereas to my understanding the chain of
> responsibility is more designed for finding an appropriate command to
> use to process a given context. Also, the pipeline is designed to run as
> a service where data can be provided for processing by automated
> systems. For example, data being beamed down from a satellite can be
> aggregated into orbits that are then passed into the pipeline for
> generation of geolocated gridded products, statistical analysis, etc.
> 
> Our group would really like to be able to contribute some of this code
> back to the commons effort, since we use a ton of commons components.
> The amount of overlap with commons-chain is significant, but I'm not
> sure it's a perfect match because of the differing goals. Does anyone
> out there know of other similar efforts? Is there a place for this sort
> of code in commons? Are we just missing something fundamental about
> commons-chain where we should simply be using that instead?
> 
> Suggestions would be much appreciated. I'm happy to send code, examples,
> and documentation to anyone who's interested.
> 
> Thanks,
> Kris
> 
> --
> =====================================================
> Kris Nuttycombe
> Associate Scientist
> Enterprise Data Systems Group
> CIRES, National Geophysical Data Center/NOAA
> (303) 497-6337
> Kris.Nuttycombe@noaa.gov
> =====================================================
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message