commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Henderson" <jaka...@bretth.com>
Subject RE: [codec] StatefulDecoders
Date Wed, 10 Mar 2004 21:54:04 GMT
> -----Original Message-----
> From: Alex Karasulu [mailto:aok123@bellsouth.net]
> Sent: Wednesday, 10 March 2004 1:44 AM
> To: 'Jakarta Commons Developers List'; brett@bretth.com
> Subject: RE: [codec] StatefulDecoders
> 
> 
> Brett,
> 
> 
> 
> Please excuse the long email.  My comments are inline.

No probs, I've been writing a few of them myself ;-)

> > > You can build on this separately no?  There is no need to
> 
> > > have the codec
> 
> > >
> 
> > > interfaces take this into account other then allow this
> 
> > > decoration in the
> 
> > >
> 
> > > future rather than inhibit it.
> 
> > >
> 
> > 
> 
> > Yes, I can build on it separately, however a new set of
> producers and
> 
> > consumers are needed for each type of structured data.  I don't see
> > this
> 
> > as
> 
> > a problem because trying to make this too generic may lead
> to loss of
> 
> > performance and a complicated API.
> 
> 
> 
> True it could but that is to presume that we follow the model
> both you and
> 
> Noel are in favor of however I don't think this is as much of
> a concern 
> 
> with the stateful decoder with a callback mechanism.

I don't understand what difference the callback mechanism makes because I
think both approaches are equivalent here.

Perhaps our two approaches are closer in design than you suspect.  I'll talk
about ByteProducer and ByteConsumer from my implementation to illustrate
this.  These interfaces are used to pass events of type "byte[]".  A
ByteConsumer is attached to a ByteProducer and receives all data generated
by the ByteProducer implementation (eg. Base64Encoder).  The "events" are of
type byte[].  ByteConsumer is the callback object for ByteProducer.

Your two major interfaces are StatefulDecoder and DecoderCallback.
StatefulDecoder is the equivalent of the combination of ByteProducer and
ByteConsumer (ie. ByteByteEngine).  DecoderCallback is the equivalent of
ByteConsumer only.

The major difference I see is that I have separated the input interface from
the output interface.  The eliminates the need for the callback interface
because the callback interface is simply the input interface.  I believe
this is a simpler and more flexible approach because there is no distinction
between encoders/decoders and callback objects, an encoder/decoder can act
as the callback object for another encoder/decoder if necessary.
Producers/consumers provide all the benefits of callbacks and have other
additional benefits.

The second difference is that I have created specific producers/consumers
for each data type.  If strong typing is not desired, these specific
interfaces could be removed and Producer would always produce data of type
Object and Consumer would always receive data of type Object.  This would
reduce the complexity of my implementation to the same as yours.

Pipelining (for serial pipelines) does not add complexity to my
implementation.  However it is directly supported because each stage can act
as the callback object for the previous stage allowing stages to directly
communicate.  This is very simple and allows great flexibility in creating
encoding chains/pipelines.

Does the above make sense?  If so, please give it careful consideration
because I originally used the callback design and modified it to use
producers/consumers because I think it is actually simpler and is much more
flexible.

If you're still not convinced I guess I'll have to give in and go with the
flow ;-)

> This approach considers decoders to be similar to the way
> parsers like SAX 
> 
> work.  They generate low level events specific to the
> encoding.  In this way
> 
> 
> 
> they are similar to your notion of producers.  Others can
> build on these 
> 
> fundamental building blocks to take content into question.
> Look at the way 
> 
> DOM is built on SAX or the way API's like digester operate on
> these events 
> 
> to derive more meaning from these events based on content.
> I'm viewing 
> 
> decoders to be simple event generators based on encoding
> structure and 
> 
> modeling the callback as such events.  This way the decoder
> does what it 
> 
> does minimally leaving any higher level content
> interpretation up to other 
> 
> facilities built on top.

Yep, this is a good approach, each decoder should not be concerned with
anything other than implementing an algorithm.

> > > There is no reason why we should cloud this picture for the
> 
> > > simple user.
> 
> > 
> 
> > I agree that we definitely don't want to introduce complexity and
> 
> > computational overhead for simple cases.  However I think
> many of the
> 
> > above
> 
> > concepts can be supported without creating complex APIs.
> 
> > 
> 
> > I believe these are the interfaces you have previously
> posted.  Let me
> 
> > know
> 
> > if I've got the wrong ones :-)
> 
> 
> 
> These we updated here's a new link in the JIRA:
> 
> 
> 
> http://nagoya.apache.org/jira/secure/ViewIssue.jspa?key=DIR-30
> 
> 
> 
> Feel free by the way to make any additional comments on the
> JIRA itself.
> 
> 
> 
> Are we not presuming that stages will exist when they are not
> really a 
> 
> requirement?  I think what I'm trying to say is that
> pipelining with stages 
> 
> presumes that there is pipelining.  What we should do is make
> sure that 
> 
> pipelining is possible without presuming that decoders will
> always be used 
> 
> in that particular fashion.  I personally will be using these
> decoders in a 
> 
> pipeline within the Eve server for sure.
> 

As I've stated above, stages do not need to add complexity to the design,
they are simply a result of allowing input and output interfaces to
communicate.

> 
> > Firstly, I don't see the need to distinguish between encoding and
> > decoding
> 
> > because it is conceivable for stages to perform some operation that
> 
> > doesn't
> 
> > fit into the traditional encoding/decoding model (eg. a
> stage wrapping
> 
> > lines
> 
> > every 80 characters).  This means that a single interface is
> > implemented
> 
> > by
> 
> > both encoders and decoders.
> 
> 
> 
> I guess this is more a filter concept.  You can have linear
> combinations 
> 
> of filters.  But a decoder and its encoder counterpart are
> special types 
> 
> of filter no?  The codec library is here to specifically address the
> 
> specific subset of encoder decoder filter concerns rather
> than the gambit of
> 
> 
> 
> filter concerns.  This is just my interpretation anyone
> please correct me if
> 
> 
> 
> I'm wrong.

You're right, encoders and decoders are special types of filters.  But why
create a distinction between the two when there is no reason to.  If
encoders require different methods to decoders then by all means create
separate interfaces, but they are doing the same thing (transforming data)
so surely we can use the same interfaces for both purposes.

> 
> > Secondly, when I first started designing my implementation I had a
> > similar
> 
> > concept of stages (engines in my implementation) and callbacks
> > (receivers
> 
> > in
> 
> > my implementation).  Then I wanted to couple two stages together.  I
> 
> > needed
> 
> > a connector object to pass output from one stage into
> another stage.
> > I
> 
> > realised that I could eliminate the connector object by passing data
> 
> > directly from one stage to the next.  To solve this I came
> up with the
> 
> > notion of producers and consumers.  A consumer receives
> data and does
> 
> > something with it.  A producer "produces" data and passes it to a
> > matching
> 
> > consumer.
> 
> 
> 
> You really should look at SEDA. I think we're trying to put
> these concepts
> 
> here into the codec where they really belong in their own
> separate commons 
> 
> project.  I think it was Craig McClanahan who at some point
> had asked about 
> 
> putting together a little simple SEDA library together in the
> commons.  If 
> 
> you have a minute take a look at this really (I can't
> overstate that) simple
> 

Some of the concepts and terminology between codec and an event based server
are similar.  In codec our "events" are data being generated by an
algorithm, in SEDA events are more generic messages being passed between
components of a server.  However the goals of the two are very different,
codec cares about efficient transformation of data and SEDA cares about
enabling highly concurrent processing with huge loads.
I don't believe adding a producer/consumer concept to codec means we're
confusing the boundaries between the two systems simply because they both
use the terms producer and consumer.

> 
> Take a look at the "Reclaiming Type Safety" section in this 
> article on the 
> 
> event notification pattern here:
> 
> 
> 
> http://members.ispwest.com/jeffhartkopf/notifier/
> 

I'll try to remember when I get into work tomorrow.  (I don't currently have
a net connection at home, hence some of the delay in responding to emails)

> 
> 
> <snip/>
> 
> 
> 
> > If simplicity of use is required (for example, simple in-memory data
> 
> > conversion) a simple wrapper class can be defined (implementation 
> > removed
> 
> <snip/>
> 
> 
> 
> See this is too much I think if you have to do that to 
> simplify.  But I have
> 
> this exact same problem when trying to make SEDA stages act as simple 
> 
> services to just perform a synchronous task out of band 
> rather than the
> 
> asynchronous, in band, processing of the event stream coming in.

The wrapper class will be necessary regardless of whether you're using a
producer/consumer approach or callbacks.  I may have been confusing with my
above statements but the purpose of the wrapper class is to allow a client
to use the library without caring about callbacks, events, etc when the size
of their data is small.

For example, I have an array of data and I want to base64 encode it.  I want
to be able to use an encoder without catching events or processed data, I
just want to call a method passing a byte[] argument and receive a byte[]
result.  That is what the wrapper class provides.

Of course with either callbacks or producer/consumer the base64 class can
always implement these direct conversion methods directly eliminating the
need for wrappers but in the interest of minimising code I kept it separate
for my implementation.

> 
> 
> 
> I don't want to make this trail any longer than it already 
> is.  I think
> 
> we'll loose certain folks trying to follow it.  But let's just see if 
> 
> we can agree that we need to separate codec concerns from 
> SEDA concerns. 
> 

I do agree with you here, really :-)  I just don't think I've mixed the two
concerns yet.

> 
> Let's then start another SEDA trail and see if we can both 
> work together 
> 
> to create a simple SEDA API.  Are you up for that?
> 

Yeah, I'm interested.  I have a fair bit on at the moment but will see how I
go.

Brett


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message