Mailing-List: contact directory-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Apache Directory Developers List"
 <directory-dev@incubator.apache.org>
Received-SPF: neutral (hermes.apache.org: local policy)
Subject: Re: [asn1] Rewrite goals
From: Emmanuel Lecharny <elecharny@iktek.com>
To: Apache Directory Developers List <directory-dev@incubator.apache.org>
In-Reply-To: <421A46CD.40204@bellsouth.net>
References: <421A46CD.40204@bellsouth.net>
Content-Type: text/plain
Date: Tue, 22 Feb 2005 01:30:00 +0100
Message-Id: <1109032200.5471.80.camel@portable.iktek.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

> 1). We would like to make sure the implementation is easy to comprehend 
> (requiring as little a learning curve as possible), is easy to maintain 
> and hence extend.  More importantly we must lean towards making it 
> easier for users than for implementors of codecs or stub compilers.

+1. We also need to explain the choices that have been done, and the
expected improvment in each part. There are places where, despite a
little perfomance loss, we could stay with a more readable code. For
instance, adding some transition states that does nothing but log some
trace in debug mode could cost a little bit. Performance is not the
graal.

I may be wrong, but a 10% improvment in performance does not worth a
penny if it drives to a bloated code. (remember those C atrocity like
unfolding 'for'...)


> 2). We would like the runtime to be fast and efficient.  We want to make 
> sure the runtime more so than the compiler is most efficient both from 
> the standpoint of time and space.  With the server, a fixed operational 
> footprint for both encoders and decoders is critical.  Having the 
> footprint vary as a function of the PDU size makes a server susceptable 
> to DoS attacks.

Time is only one aspect, the easier to deal with. Space is much more
difficult. We must act as if we have a limited amount of space.
OutOfMemory exception is NOT an option. The best solution could be to
pre-allocate all the structure, and to store them on a pool. It may cost
much more than create those object on the fly, but at least we can limit
the memory footprint.

A way of doing that, as will have many threads decoding at the same
time, could be to allocate those structure globally, and to deliver them
to asking threads, which will keep them in their own stack. If we reach
a stravation state, three options are available : 
  -1- create new objects, if memory footprint is still low (we will have
to test this memory footprint)
  -2- kill some decoding process (like if we were in deadlock
condition), which leads to the difficult decision : who will get the
short straw?
  -3- serialize some ojects. If we keep a trace on objects size, we
could perfectly decide to serialize objects bigger than a specific size
(let say, 4096 bytes), then if it's not enough, objects under 2048 and
so on. Its a kind of swapping system. It could be implemanted, but it's
much more complicated than -2- or -1-

Using MemoryMappedFiles (NIO) could help.


> 3). Different protocols will have different encoding requirements or 
> restrictions.  Furthermore different ASN.1 encodings will incur more 
> costs since they have more rules.  There is no need to compromise 
> performance for a generalized solution.  Using a combination of patterns 
> we should be able to squeeze as much performance as possible without 
> comprimizing ease of use while meeting the needs of different encodings: 
> ultimately the needs of different protocols.  Separate implementations 
> can be made plugable to reduce overheads while keeping the 
> implementation extremely easy to comprehend, debug and maintain.  
> Overall we want to have some semblence of a generalized or unified ASN.1 
> runtime approach without paying for the penalties.  I think this will 
> put the patterns we choose to use to the test.

DER is a restricted BER, so we may inherit from BER. Either we
copy/paste the code and add the restriction (yark...), or we try to keep
most methods common, and write DER specifics pieces of code when
necessary.

I think that we can forget about CER.

PER is completly a different thing. It deserve a full decoder, without
any possibility t share a single piece of code with BER/DER decoder.

What about XER? For those Xtreme Xml Xperts, we may add it to a roadmap,
very far from where we are?

Whatever, Decoders should be stateless, and should expose an interface
to the upper layer (stubs). 

Decoder *must* fail fast. That mean we must check every tag to see if
it's an allowed one, in respect with the state automaton which
implements the grammar (LL grammar <==> state automaton) We also have to
check length and values. They could be constrained (INTEGER [0..127]),
and we want to stop the decoding if constraints are not respected. 

Those tests must be very fast ones, and must not be a part of the
decoder. We will need to implemant either a callback mechanism or
generic and parametrized matching rules for each type of constraints. 

The callback approach sounds simpler to implement, and could really
easily be generated by a compiler.

> 
> 4). With #3 its clear we want a unified interface for codecs even if 
> implementations under the hood are made pluggable when the protocol and 
> encoding is determined. 

+1

> 
> 5). We must also consider how the constructs in the runtime effect the 
> stub compiler.  We want the prioritize a few things in the runtime over 
> the compile time such as performance.  However we must draw a line where 
> runtime constructs make it difficult or impossible for a stub compiler 
> to generate code.  These trade offs I think will make themselves more 
> apparent as we begin to investigate patterns and mechanisms for the 
> runtime in conjunction with the build time.  This is not, at all, an 
> easy thing to foresee.

No dependance should exist between the compiler and the runtime. An
interface has to be defined (cf 4) and used. It's like having a lexer
and a parser, the parser just use the lexer results but don't have to
know the way those results are produced.


> Perhaps most importantly we need to make sure all ideas regarding ASN1 
> are in the open.  I'm sounding like a broken record here, I know!  But I 
> mean this in more than just a "put it on the list" sort of way.  For 
> example, Alan has some great pattern useage in his branch and Emmanuel 
> has great ideas he's putting forth in his wiki.  Let's begin tabula raza 
> and make sure everyone is expressing their ideas yet again even if 
> documented or within a code branch.  We can each delve into the work of 
> others to look and learn but we should give some effort towards directly 
> conveying the ideas we think are of value.  This will save time while 
> investigating what others are doing. 

wiki, even if not really the best tool to put down some ideas, are like
a blackboard. Easy to write something, easy to erase it, easy to add
comments on it. We also should define a roadmap. That will allow us to
be focused on important points. It's not really a good thing to have the
faster ASN.1 compiler on earth, or the smaller decoder on the planet if
it's alone and not plugable. We should deliver, on schedule if possible,
at the price of less performance or less functionnality, if necessary.
Nothing is much important than to deliver. (I mean delivering something
that works !!!) Being ridiculously slow or lacking many features is
better than being unusable.

Code reviews could be something to consider.

I know that it sounds like pure common sense, but I have experienced so
much common nosense...

btw, we should not have many branches. Merging is error-prone, and quite 
painfull to do, so I think that one branche is enough. ( I'm not talking 
of maintenace barnches)


> Do not presume others know what we have already discussed or what you 
> (anyone) have cached within your head.  Take nothing for granted and 
> presume people have not looked at your wiki, your online docs, or what's 
> in your branch.
> 
> Let's start fresh, together!
> 
> Cheers,
> Alex
> 
> 
> 
>