commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Karasulu" <>
Subject RE: [codec] StatefulDecoders
Date Thu, 04 Mar 2004 02:46:07 GMT
Brett, Noel,

How about we put our minds together and finalize some of this stuff so I can
start writing some codecs that can be added back to this project?  

> > In general, I have long preferred the pipeline/event model to
> > the approach
> >
> > that Alex had, where it would give data to the codec, and
> > then poll it for

Agreed! Mine approach was not the best but have you had a chance at looking
at the new interfaces that I sent out with the callbacks.  Shall I resend

> > approach to MIME content.  And I also desperately want a
> > regex in this same

Also keep in mind that we don't want to build too much on top of this
concept.  Let's agree to keeping things as simple as possible making the
codec design do what it does: provide an interface for transforming
information.  We just need this interface to be optimal for all sorts of
implementations: those based on blocking and non-blocking IO.  Also let's
not forget about other very important requirements. One of which is keeping
decoder processing footprint small or at a fixed size regardless of the size
of data being transformed.

Let me just list the requirements one more time:

1). Interfaces should allow for implementations that perform piece meal
   - enables implementations to have constant sized processing footprints
   - enables implementations to have efficient non-blocking and streaming

2). Easily understood and simple to use

3). Interfaces should in no way shape or form restrict or limit the
performance of implementations what ever they may be.  

> You're right, my design has no concept of structured content. It was
> developed to solve a particular problem (ie. efficient streamable data
> manipulation).  If API support for structured content is required then my
> implementation doesn't (yet) support it.

You can build on this separately no?  There is no need to have the codec
interfaces take this into account other then allow this decoration in the
future rather than inhibit it.

> I'll use engine for the want of a better word to describe an element in a
> pipeline performing some operation on the data passing through it.

SEDA calls this a stage btw.

> An API aware of structured content shouldn't complicate the creation of
> simple engines such as base64 which pay no attention to data structure.
> Ideally, a structured API would extend an unstructured API and only those
> engines requiring structured features would need to use it.


> I'm having trouble visualising a design that supports structured content
> without being specific to a particular type of structured content. Do you
> have some examples of what operations you would like a structured data API
> to support?  Do you see interactions between pipeline elements being
> strongly typed?

I agree with you here as well.  It is very hard to see this clearly.
Perhaps there may be ways for us to generalize things here.  

Just as an experiment let's take a look at how SAX works.  It does not care
about tag types (hence the encoded structure) but does care about valid XML
which it uses to trigger events.  In a way, to a SAX parser, the encoding is
just valid XML. 

With codecs the encoding is variable right?  It could be anything.
Something has to generate events/callbacks that delimit logical units of the
encoding what ever that may be.  For some encodings that you mentioned
(base64) there may not be a data structure but the unit of encoding must be
at least two characters for base64 I think.  Please correct me if I'm wrong.

So there is some minimum unit size that can range from one byte to anything
and this is determined by the codec's encoding and reflected in some form of
callback.  SAX uses callbacks to allow builders that are content aware do
their thing right?  Now I'm not suggesting that a base64 codec's
encoder/decoder pairs make callbacks on every 2 or single byte (depending on
your direction).  In the case of such a non-structured decoder the buffer
size would be the determining factor or the end of the stream.

So I think we need to use callbacks to let decoders tell us when they hit
some notable event that needs attention whatever that may be.

> For example, a multipart mime decoding engine (consumer of byte data,
> hence
> a ByteConsumer) could produce MIME parts (a MIMEPartProducer). A
> MIMEPartConsumer design would receive MIMEPart objects (which are in turn
> ByteByteEngines but extended with a MIME type property) and connect them
> to
> a consumer capable of handling the byte data contained in the MIME part.

Right I agree that would build on the basics we define here in a common
codec library.

> > operations.  These are pipelines; receiving content on one
> > end, performing
> >
> > operations, and generating events down a chain.  More than
> > one event could
> >
> > be generated at any point, and the chain can have multiple paths.

This, the pipelining notion, IMHO is overly complicated for building out
codec interfaces.  The pipeline can be built from the smaller simpler parts
we are discussing now.  We must try harder to constrain the scope of a
codec's definition.  

Noel as you know I have built server's based on pipelined components before
and am trying it all over again.  We must spare those wanting to implement
simple codecs like base64 from these concepts let alone the language around
them.  The intended use of codecs by some folks may not be so grandiose.
They may simply need it to just convert a byte buffer and be done with it.
There is no reason why we should cloud this picture for the simple user.  

Official disclaimer: Alex may not know what he's talking about, and these
are his views at the present moment, which do not necessarily reflect those
of other personalities that he may exhibit at a later date.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message