commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Karasulu" <>
Subject RE: [codec] StatefulDecoders
Date Tue, 09 Mar 2004 14:43:34 GMT

Please excuse the long email.  My comments are inline.

> -----Original Message-----
> From: Brett Henderson []
> > How about we put our minds together and finalize some of this
> > stuff so I can
> >
> > start writing some codecs that can be added back to this project?
> Yeah definitely, sounds like we're trying to solve the same problem here.
> I haven't responded to your previous emails because I haven't contributed
> before and was leaving opinions to those who've actually proven
> themselves.

Oh I only contributed once and it's a measly contribution at best.  As a 
community we shouldn't have these notions.  I think the commons committers 
and the PMC would agree.  So please I'm far from proven and was looking for
folks like yourself to help me along.

> > 2). Easily understood and simple to use
> Agreed, although needs to be weighed up with any conflicting requirements.

Yeah there probably will be some tradeoffs here.

> > 3). Interfaces should in no way shape or form restrict or limit the
> >
> > performance of implementations what ever they may be.
> Agreed, although without knowing all of these implementations in advance
> we
> can never be sure ;-)

Yes this is perhaps the hardest thing which we will not be able to complete 
but we can think hard on it.


> > You can build on this separately no?  There is no need to
> > have the codec
> >
> > interfaces take this into account other then allow this
> > decoration in the
> >
> > future rather than inhibit it.
> >
> Yes, I can build on it separately, however a new set of producers and
> consumers are needed for each type of structured data.  I don't see this
> as
> a problem because trying to make this too generic may lead to loss of
> performance and a complicated API.

True it could but that is to presume that we follow the model both you and
Noel are in favor of however I don't think this is as much of a concern 
with the stateful decoder with a callback mechanism.  

This approach considers decoders to be similar to the way parsers like SAX 
work.  They generate low level events specific to the encoding.  In this way

they are similar to your notion of producers.  Others can build on these 
fundamental building blocks to take content into question.  Look at the way 
DOM is built on SAX or the way API's like digester operate on these events 
to derive more meaning from these events based on content.  I'm viewing 
decoders to be simple event generators based on encoding structure and 
modeling the callback as such events.  This way the decoder does what it 
does minimally leaving any higher level content interpretation up to other 
facilities built on top.


> >
> > So there is some minimum unit size that can range from one
> > byte to anything
> >
> > and this is determined by the codec's encoding and reflected
> > in some form of
> >
> > callback.  SAX uses callbacks to allow builders that are
> > content aware do
> >
> > their thing right?  Now I'm not suggesting that a base64 codec's
> >
> > encoder/decoder pairs make callbacks on every 2 or single
> > byte (depending on
> >
> > your direction).  In the case of such a non-structured
> > decoder the buffer
> >
> > size would be the determining factor or the end of the stream.
> Agreed.
> >
> >
> >
> > So I think we need to use callbacks to let decoders tell us
> > when they hit
> >
> > some notable event that needs attention whatever that may be.
> I agree in principle here although I'm not sure that I agree with the
> structure of callbacks.  I'll explain more later.

Ok I'll be waiting to here more - will try to clear my head until then :-).

> > There is no reason why we should cloud this picture for the
> > simple user.
> I agree that we definitely don't want to introduce complexity and
> computational overhead for simple cases.  However I think many of the
> above
> concepts can be supported without creating complex APIs.
> I believe these are the interfaces you have previously posted.  Let me
> know
> if I've got the wrong ones :-)

These we updated here's a new link in the JIRA:

Feel free by the way to make any additional comments on the JIRA itself.

Are we not presuming that stages will exist when they are not really a 
requirement?  I think what I'm trying to say is that pipelining with stages 
presumes that there is pipelining.  What we should do is make sure that 
pipelining is possible without presuming that decoders will always be used 
in that particular fashion.  I personally will be using these decoders in a 
pipeline within the Eve server for sure.


> Firstly, I don't see the need to distinguish between encoding and decoding
> because it is conceivable for stages to perform some operation that
> doesn't
> fit into the traditional encoding/decoding model (eg. a stage wrapping
> lines
> every 80 characters).  This means that a single interface is implemented
> by
> both encoders and decoders.

I guess this is more a filter concept.  You can have linear combinations 
of filters.  But a decoder and its encoder counterpart are special types 
of filter no?  The codec library is here to specifically address the 
specific subset of encoder decoder filter concerns rather than the gambit of

filter concerns.  This is just my interpretation anyone please correct me if

I'm wrong.

> Secondly, when I first started designing my implementation I had a similar
> concept of stages (engines in my implementation) and callbacks (receivers
> in
> my implementation).  Then I wanted to couple two stages together.  I
> needed
> a connector object to pass output from one stage into another stage.  I
> realised that I could eliminate the connector object by passing data
> directly from one stage to the next.  To solve this I came up with the
> notion of producers and consumers.  A consumer receives data and does
> something with it.  A producer "produces" data and passes it to a matching
> consumer.

You really should look at SEDA. I think we're trying to put these concepts
here into the codec where they really belong in their own separate commons 
project.  I think it was Craig McClanahan who at some point had asked about 
putting together a little simple SEDA library together in the commons.  If 
you have a minute take a look at this really (I can't overstate that) simple

SEDA implementation used in the Eve server:

Also if you have a minute you might want to take a look at Matt Welsh's SEDA
architecture where pipelining is the main theme.  Here's the main URL for 

BTW SEDA is all about event sources and sinks which is just another way of
stating that its producer consumer oriented.

> The notion of producers and consumers is very flexible because it allows
> classes to be defined which implement one or both interfaces as required
> by
> their purpose.  A stage (or engine in my implementation) is both a
> producer
> and consumer performing some operation on the data before passing it
> along.
> An input adaptor (simply called a producer) is the first stage in a
> pipeline
> obtains data from some outside source and sends it into the framework.  An
> output adaptor (called a consumer) is the final stage in a pipeline and
> passes it onto some external destination (ie. an OutputStream or perhaps
> NIO).

Please take a look at SEDA.  Let's focus on taking these ideas you have hear

along with this simple SEDA library, making it more useful and moving it 
into commons sandbox or something if that's possible.  Perhaps we can 
continue this specific conversation under a SEDA for commons trail.

> A minor point is that producers should only have to handle one matching
> consumer.  Making every producer capable of passing data to multiple
> consumers complicates simple encoders and decoders.  I'd prefer to create
> separate stages for this purpose (eg. a splitter stage) when data needs to
> be passed to multiple destinations.  This is simpler and more efficient
> than
> every stage managing a consumer list when the majority of uses don't
> require
> it.

Funny you mention this.  When building protocol servers based on events 
(call is SEDA or a derivative of it) I found separating the event handling 
into a distinct component very useful.  It also reduces the amount of 
coupling between stages making it easier to reroute the flow of events 
within the server.  I have some information about this and how I've used 
this approach here:

> The next area that my implementation differs in is strong typing of data.
> Perhaps strong typing is not critical (collections don't use it) although
> I
> do like to utilise strong typing whenever possible because it eliminates
> many bugs and in many cases can make coding more intuitive.  The
> interfaces
> defined above define the "event" or data to be of type Object.  When using
> the producer and consumer concept, the base producer/consumer interfaces
> can
> be extended for each data type being supported.  For example, byte
> processing requires ByteConsumer and ByteProducer interfaces.  The
> advantage
> of this approach is that a ByteProducer stage can never attempt to pass
> data
> to a consumer of a different type (the compiler will enforce this).

I agree that strong typing is a good thing.  I'm a big fan of it.  As a
matter of fact its one of the reasons why I can't bear to look at PERL :-).

Take a look at the "Reclaiming Type Safety" section in this article on the 
event notification pattern here:


> If simplicity of use is required (for example, simple in-memory data
> conversion) a simple wrapper class can be defined (implementation removed

See this is too much I think if you have to do that to simplify.  But I have
this exact same problem when trying to make SEDA stages act as simple 
services to just perform a synchronous task out of band rather than the
asynchronous, in band, processing of the event stream coming in.

I don't want to make this trail any longer than it already is.  I think
we'll loose certain folks trying to follow it.  But let's just see if 
we can agree that we need to separate codec concerns from SEDA concerns. 

Let's then start another SEDA trail and see if we can both work together 
to create a simple SEDA API.  Are you up for that?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message