commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Henderson" <brettchender...@yahoo.co.uk>
Subject RE: [codec] StatefulDecoders
Date Mon, 08 Mar 2004 22:28:49 GMT
> How about we put our minds together and finalize some of this
> stuff so I can
> 
> start writing some codecs that can be added back to this project?

Yeah definitely, sounds like we're trying to solve the same problem here.

I haven't responded to your previous emails because I haven't contributed
before and was leaving opinions to those who've actually proven themselves.

> > > In general, I have long preferred the pipeline/event model to
> 
> > > the approach
> 
> > >
> 
> > > that Alex had, where it would give data to the codec, and
> 
> > > then poll it for
> 
> 
> 
> Agreed! Mine approach was not the best but have you had a
> chance at looking
> 
> at the new interfaces that I sent out with the callbacks.
> Shall I resend
> 
> those?
> 

I still have them here.  I'll comment on them further down.

> Let me just list the requirements one more time:
> 
> 1). Interfaces should allow for implementations that perform
> piece meal
> 
> decodes
> 
>    - enables implementations to have constant sized
> processing footprints
> 
>    - enables implementations to have efficient non-blocking
> and streaming
> 
> operation

Agreed.

> 2). Easily understood and simple to use

Agreed, although needs to be weighed up with any conflicting requirements.

> 3). Interfaces should in no way shape or form restrict or limit the
> 
> performance of implementations what ever they may be.

Agreed, although without knowing all of these implementations in advance we
can never be sure ;-)

> 
> > You're right, my design has no concept of structured content. It was
> 
> > developed to solve a particular problem (ie. efficient
> streamable data
> 
> > manipulation).  If API support for structured content is
> required then
> > my
> 
> > implementation doesn't (yet) support it.
> 
> 
> 
> You can build on this separately no?  There is no need to
> have the codec
> 
> interfaces take this into account other then allow this
> decoration in the
> 
> future rather than inhibit it.
> 

Yes, I can build on it separately, however a new set of producers and
consumers are needed for each type of structured data.  I don't see this as
a problem because trying to make this too generic may lead to loss of
performance and a complicated API.

> 
> > I'll use engine for the want of a better word to describe
> an element
> > in a
> 
> > pipeline performing some operation on the data passing through it.
> 
> 
> 
> SEDA calls this a stage btw.

Much better :-)

> 
> With codecs the encoding is variable right?  It could be anything.
> 
> Something has to generate events/callbacks that delimit
> logical units of the
> 
> encoding what ever that may be.  For some encodings that you mentioned
> 
> (base64) there may not be a data structure but the unit of
> encoding must be
> 
> at least two characters for base64 I think.  Please correct
> me if I'm wrong.

3 byte input 4 byte output for encoding, and 4 byte input 3 byte output for
decoding.  Input is padded if not a multiple of 3 bytes.

> 
> So there is some minimum unit size that can range from one
> byte to anything
> 
> and this is determined by the codec's encoding and reflected
> in some form of
> 
> callback.  SAX uses callbacks to allow builders that are
> content aware do
> 
> their thing right?  Now I'm not suggesting that a base64 codec's
> 
> encoder/decoder pairs make callbacks on every 2 or single
> byte (depending on
> 
> your direction).  In the case of such a non-structured
> decoder the buffer
> 
> size would be the determining factor or the end of the stream.

Agreed.

> 
> 
> 
> So I think we need to use callbacks to let decoders tell us
> when they hit
> 
> some notable event that needs attention whatever that may be.

I agree in principle here although I'm not sure that I agree with the
structure of callbacks.  I'll explain more later.

> > > operations.  These are pipelines; receiving content on one
> 
> > > end, performing
> 
> > >
> 
> > > operations, and generating events down a chain.  More than
> 
> > > one event could
> 
> > >
> 
> > > be generated at any point, and the chain can have multiple paths.
> 
> 
> 
> This, the pipelining notion, IMHO is overly complicated for
> building out
> 
> codec interfaces.  The pipeline can be built from the smaller
> simpler parts
> 
> we are discussing now.  We must try harder to constrain the scope of a
> 
> codec's definition.
> 
> Noel as you know I have built server's based on pipelined
> components before
> 
> and am trying it all over again.  We must spare those wanting
> to implement
> 
> simple codecs like base64 from these concepts let alone the
> language around
> 
> them.  The intended use of codecs by some folks may not be so
> grandiose.
> 
> They may simply need it to just convert a byte buffer and be
> done with it.
> 
> There is no reason why we should cloud this picture for the
> simple user.  

I agree that we definitely don't want to introduce complexity and
computational overhead for simple cases.  However I think many of the above
concepts can be supported without creating complex APIs.

I believe these are the interfaces you have previously posted.  Let me know
if I've got the wrong ones :-)

public interface StatefulDecoder
{
    void decode( ByteBuffer buffer ) throws DecoderException ;
    void register( DecoderCallback cb ) ;
}

public interface DecoderCallback
{
    void decodeOccurred( StatefulDecoder decoder, Object decoded ) ; }

I believe we can make the above interfaces more flexible by changing how we
think about each stage.

Firstly, I don't see the need to distinguish between encoding and decoding
because it is conceivable for stages to perform some operation that doesn't
fit into the traditional encoding/decoding model (eg. a stage wrapping lines
every 80 characters).  This means that a single interface is implemented by
both encoders and decoders.

Secondly, when I first started designing my implementation I had a similar
concept of stages (engines in my implementation) and callbacks (receivers in
my implementation).  Then I wanted to couple two stages together.  I needed
a connector object to pass output from one stage into another stage.  I
realised that I could eliminate the connector object by passing data
directly from one stage to the next.  To solve this I came up with the
notion of producers and consumers.  A consumer receives data and does
something with it.  A producer "produces" data and passes it to a matching
consumer.

The notion of producers and consumers is very flexible because it allows
classes to be defined which implement one or both interfaces as required by
their purpose.  A stage (or engine in my implementation) is both a producer
and consumer performing some operation on the data before passing it along.
An input adaptor (simply called a producer) is the first stage in a pipeline
obtains data from some outside source and sends it into the framework.  An
output adaptor (called a consumer) is the final stage in a pipeline and
passes it onto some external destination (ie. an OutputStream or perhaps
NIO).

A minor point is that producers should only have to handle one matching
consumer.  Making every producer capable of passing data to multiple
consumers complicates simple encoders and decoders.  I'd prefer to create
separate stages for this purpose (eg. a splitter stage) when data needs to
be passed to multiple destinations.  This is simpler and more efficient than
every stage managing a consumer list when the majority of uses don't require
it.

The next area that my implementation differs in is strong typing of data.
Perhaps strong typing is not critical (collections don't use it) although I
do like to utilise strong typing whenever possible because it eliminates
many bugs and in many cases can make coding more intuitive.  The interfaces
defined above define the "event" or data to be of type Object.  When using
the producer and consumer concept, the base producer/consumer interfaces can
be extended for each data type being supported.  For example, byte
processing requires ByteConsumer and ByteProducer interfaces.  The advantage
of this approach is that a ByteProducer stage can never attempt to pass data
to a consumer of a different type (the compiler will enforce this).

To avoid further rambling, I'll explain with some examples.

My consumer interface is:
public interface Consumer {
	public void reset() throws CodecException;
	public void flush() throws CodecException;
	public void finalize() throws CodecException;
}
My producer interface is:
public interface Producer {
}

They're not very useful on their own but must be extended per data type. My
byte consumer interface is: public interface ByteConsumer extends Consumer {
	public void process(byte data[]) throws CodecException;
	public void process(byte data[], boolean finalize) throws
CodecException;
	public void process(byte data[], int offset, int length) throws
CodecException;
	public void process(byte data[], int offset, int length, boolean
finalize) throws CodecException; } My byte producer interface is:
public interface ByteProducer extends Producer {	
	public void setConsumer(ByteConsumer consumer);
}

This approach allows consumers to define overloaded process methods that
suit the data type being supported and enforces data types being passed
between stages.

I believe this approach provides all the benefits of the provided
interfaces, is still simple to implement and supports strong typing.

If simplicity of use is required (for example, simple in-memory data
conversion) a simple wrapper class can be defined (implementation removed
for clarity):
public class SimpleByteByteEngine {
	public SimpleByteByteEngine(ByteByteEngine engine);
	public void reset();
	public byte[] flush();
	public byte[] process(byte data[]);
	public byte[] process(byte data[], boolean finalize);
	public byte[] process(byte data[], int offset, int length);
	public byte[] process(byte data[], int offset, int length, boolean
finalize);	
}
This class accepts a stage object and directly returns the data encoded.
For data contained in a single array, the first overloaded process method
would be called directly returning the result.

The full implementation is at:
http://www32.brinkster.com/bretthenderson/bhcodec-0.7.zip

> Official disclaimer: Alex may not know what he's talking
> about, and these
> 
> are his views at the present moment, which do not necessarily
> reflect those
> 
> of other personalities that he may exhibit at a later date.

Hehe, me too.

> Alex

Brett


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message