directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@iktek.com>
Subject Re: [asn1] why use TLV objects at all?
Date Thu, 24 Feb 2005 09:33:06 GMT
Hi all !


Le jeudi 24 février 2005 à 01:55 -0500, Alex Karasulu a écrit :
> Alan D. Cabrera wrote:
> 
> >
> >
> > Alex Karasulu wrote:
> >
> >> Alan D. Cabrera wrote:
> >>
> >>>
> >>>
> >>> Alex Karasulu wrote:
> >>>
> >>>> Emmanuel,
> >>>>
> >>>> I was just thinking about your position on object creation.  Namely

> >>>> the one that is against the creation of Tuple objects that 
> >>>> represent TLVs.  Your proposal to use pooling of these objects 
> >>>> worries me a bit.  It just makes me think there would be a lot of 
> >>>> synchronization overhead.  I may be wrong.

No synchronisation : its a local pool, each thread has its own pool. You
won't have 10 000 threads, so it's ok.

> >>> I was also concerned by this as it may require that you keep a rough 
> >>> factor of 2 more memory, one for the Tuple structure of the message 
> >>> and one for the POJO that you are creating.

TLV are allocated already, so it's not a pb. The value part will be
copied to stubs as the stub read it, so this is the only variable part.
You create the value while reading the PDU, and pass it to the stub. So
memory consumption is just like sizeof(stubs) + sizeof(data) + sizeof
(preallocated TLV). It's really important that the memory footprint is
somehow static, even if big at the beginning. We are tradding initial
memory need against stability on the long term.

> >> It would be if we were collecting all PDU tuples to form a TLV tree.  
> >> However the idea is to use and release whatever is allocated to the 
> >> tuple.  In this case, the only time you have two copies of a datum 
> >> (tuple value) is when you are holding on to the value long enough to 
> >> set a stub's property using with the tuple value.

We don't have two copies of a datum. When the TLV is a Primitive one, as
soon as its data has been completly read, we can pass it to the
POJO/Stub. We just keep a reference to it in the TLV, the only
duplication which could occur is the Tand L parts, but, again, it's not
simply a duplication: TLV are allocated from the beginning, and won't be
released untill you stop the server.

> >> Furthermore if we implement the strategy of streaming a large value 
> >> to disk (say a JPEG photo) then the value is just a URI to access the 
> >> stream later on.  This URI is what is set as the stub property 
> >> value.  So in this case we don't have the double hit as mentioned 
> >> above where a value is in memory in a Tuple and duplicated in the 
> >> value of the stub property.

+1 for the URI. It could also be a sub-classed StreamedTLV, which has
the same interface. The implentation of its getData method will handle
the situation.

> >
> >
> > So we only keep a stack of tuples?
> 
> You mean constructed tuples for nesting?  Depends on the stub.  I don't 
> think even that may be needed.  Don't know for sure yet though.

Stub/POJO is just the final representation of the data. Obviously we can
avoid all those TLVs plumbingif we have a compiler that handle it.A
deepth first decoding strategy is something that is faster than a two
layers parser/lexer strategy, but it's much more complicated.


> >>>> However I started thinking, "why create Tuples at all?" Follow my 
> >>>> concepts here for a sec even though we have not been discussiong 
> >>>> these constructs: TupleProducers and TupleConsumers.  A producer 
> >>>> simply emits callbacks to a consumer and they are bound to each 
> >>>> other.  What if the callbacks did not pass in a Tuple as an 
> >>>> argument but the components T, L and V of the Tuple instead.  A 
> >>>> stub, which is like the parser you mentioned, tracks and changes 
> >>>> state as an automaton to populate its properties appropriately with

> >>>> the stream of Tuple events.  The stub can be a TupleConsumer - 
> >>>> really a tuple event consumer rather.  This would eliminate object 
> >>>> creation overheads and populate the stub. 

If you want to control L, you need to keep a track of the Constructed
TLVs. Primitives TLV are not very importants, we can discard them
immediatly, just keeping their V part. So keeping a stack of Constructed
TLVs is just a question of fixating length.

> >>> Could you not flatten it even further by making a compiler generated 
> >>> stub act as both the producer and consumer?  This is the tack that I 
> >>> am taking with my "smart" stubs.

Yes for sure. But then it will become difficult to track bad PDU (I
mean, PDU in which Length are not correct).



> >> I highly discourage this approach.  Reason being the nature of the 
> >> relationship between ASN.1 and encodings.  As you know an ASN.1 spec 
> >> can use any encoding.  Conventionally a protocol specifies an 
> >> encoding and sticks to it so it seems to support your approach.  This 
> >> however is not always the case and ASN.1 is being used in new ways 
> >> where alternate encodings are being applied to different data 
> >> structures based on the target: i.e. GSM network clients.  However 
> >> these are not the strongest cases for why you should avoid this 
> >> "smart" stub approach IMO.
> >
> >
> > Each stub is specific to a particular encoding.  It is the POJOs that 
> > are used that are universal to the encodings.
> 
> Ahh ok you mean there's a difference between the stub and a POJO.  I 
> thought the pojo is the stub.  Or are you refering to some base class or 
> POJI?

We should agreed on terms, don't you think so? In my mind, a POJO is an
instance of a ASN.1 path through a specific grammar (for example, a
LdapBindResponse POJO, or a LdapSearch POJO). The stub is the class that
feed the POJO with Data. So the stub is the POJO producer/consumer (with
or without TLV). wdyt ?


> >> The most important reason is to decouple the generation of encoding 
> >> specific code from the stub compiler.  If you make your stubs 
> >> "encoding aware" then your adding some serious complexity to the stub 
> >> compiler IMO.  Why do this when you can avoid it and gain the ability 
> >> to swap out the encoding at runtime?
> >
> >
> > You have the ability to swap out encodings at runtime, you can just 
> > switch stubs. 

Both of you are right, it's just a question of "how long will it take to
write the compiler?" versus "do we really need a compiler at the
moment?"

> 
> So interface or base class is the same but concrete implementation is 
> the stub for a particular encoding?
> 
> >
> >> The way I like to visualize this is ... there is a common 
> >> representation the stub compiler needs to work with.  Rather than 
> >> read bytes from a stream it responds to tuple events as its input at 
> >> a higher level.  Regardless of the type of encoding at the lowest 
> >> level the stub compiler and the stubs it generates need not be 
> >> aware.  It's sort of like the way javac works with the underlying 
> >> runtime: the compiled code as byte codes are bound at runtime to the 
> >> underlying native code to do the actual work using native code.  
> >> Similarly here I'm recommending that the stub compiler generate a 
> >> stub which deals only with TLVs and at runtime the source/target can 
> >> be a BER, DER, PER binary stream or even a XER encoded ascii stream.
> >> I think perhaps some of your concerns on the stub compiler side 
> >> revolve around finding a tangible way for the antlr based stub 
> >> compiler to generate code that deals with a TLV stream rather than a 
> >> byte/char stream.  I too have this problem - it is not easy.  In this 
> >> regard the approach of making the stub totally encoding aware may 
> >> seem easier to do.
> >
> >
> > IIUC, PER does not use TLVs.  You need to know the structure of your 
> > ASN1 object to decode the stream.
> >
> > Keep it simple.  We may as well remove the layer and generate protocol 
> > specific stubs.
> 
> If you're writing the stub compiler then its your call.  However I'm 
> still not convinced this is keeping it simple.  Have you already 
> finished the parts of the compiler that can handle different encodings?

Alan is perfectly right. PER is really inseparable from the specific
ASN.1 grammar it encodes. You need to know the semantic of the incomming
data to decode it, because T and L are optionnal (I mean, not optionnal
in a way you are allowed to skip them, but they depend on the grammar).
So when writing an ASN.1 decoder for a specific grammar using a PER
decoder, the layer approach is totally useless.  It's much more
something like "give me the 5 following bits that I know represent the
Value I'm reading" decoder. Quite complicated to implement, but it's the
way it works. You need a compiler.

If you have this PER enabled codec compiler, having the same BER/CER/DER
enabled codec compiler is a piece of cake. No layers, no TLVs, just a
compiler.

This could also be a perfect new Apache project : Apache ASN.1 free
compiler. (a SNACC GPLized, in a way !)

How far are we from this target? 

I also want to realize what is the cost of coding/decoding data against
the cost of fetching/storing them in the database. If it's a 50/50
ratio, we could have major performance improvement by first implementing
the layered approach then the compiler one when it's ready. If it's a
10/90, forget about layers. Let's focus on compiler on one side, and
other performance issues on the other.

wdyt?





Mime
View raw message