directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Re: [asn1] Stateful Decoder question.
Date Wed, 05 Jan 2005 18:31:56 GMT
Hi Robert,

> From: Robert Newson <>
> Date: 2005/01/04 Tue PM 11:36:20 EST
> To:
> Subject: [asn1] Stateful Decoder question.
> Hi,

> I started building an IMAP grammar with Antlr which can handle a
> useful subset of the full IMAP grammar and I'm happy with this
> approach. The generated parser blocks for I/O if its input is
> incomplete, so I need to decode in several passes.
> My question (finally) is, is the StatefulDecoder work you're doing in
> the Asn1 project applicable to my problem? I see that there's a basic
> level that is Asn1-agnostic.

The codec package in the ASN.1 subproject is actually independent of ASN.1.  I basically needed
some interfaces to chunk decode data while it was streamed into the server.  I used a callback
mechanism presuming implementations of these interfaces, the actual codecs, would decode on
a per chunk basis and even stream large peices of data to disk rather than keeping them in
memory.  This way there is a fixed size to the memory needed while handling messages of variable
size.  For a server this is critical especially with the potential for DoS attacks.  Plus
this class of non-blocking chucking codecs maintain state between operations (hence the name)
so they are ideal for non-blocking constructs in NIO: a good fit.

These interfaces are rather general and I think I will make them more specific for the ASN.1
stuff.  I made them general to try to get the code to go into jakarta-commons codec.  However
I have abandoned this notion at least for now.

> I'm keen to build a high-performance, non-blocking and elegant
> solution to this problem, but I'm now thrashing backwards and forwards
> for the right tool.

I totally understand where you are comming from.  I too had been confronted with this problem
when coming up with these interfaces.  It's a tough one.  I got to a point where I can almost
solve the problem gracefully.  I will refactor asn1 aggressively in a few weeks to solve various
uglies and deficiencies. 

However the ideal solution here if I could have a wish is for a tool like antlr to generate
stateful parsers that can be fed (push parsers) a chunk of input at a time without blocking.
 How awesome would that be?  The same grammar should generate both types of parsers.  Then
writing protocol codecs would be a cake walk.  The codecs usually are half the battle in writing
a protocol server regardless of whether the protocol is text based or binary.

Unfortunately we have nothing like that when I last searched 3 months ago.  I'd love to be
able to modify antlr to do this and conditionally put a threshold on the input as it arrives
so antlr can stream decoded results/output to disk.  But time is finite :(.

Perhaps you might like to carve out your own interfaces for doing this.  Unfortunately I will
change the stateful stuff to be even more specific to ASN.1 or binary encodings.  However
there really is nothing to these API's: they're a joke and not worth the dependency.  I'm
sure you can carve out your own callback based API or do much better than I here.  There may
be better producer consumer models for pushing data into a stateful parser to process your
email data.  But this does mean you might have to hand code your own parser instead of using
antlr which only produces blocking parsers with all contents maintained in memory.

Now that I think if it you might be able to do both.  Hmmm I'm just pulling this out of my
arse so bear with me.  You can get antlr to generate your lexer parser pair and break apart
the generated code.  I'm sure there are small fragments in the message that can be separated
from larger parts like a message body or attachments (I know little about mail protocol but
just guessing).  You may be able to replace the sections that deal with the larger chunks
using a non-blocking push model with chunking.  This might not be possible due to limitations
in antlr if the entry point lexer rule blocks though.  On second thought this is sounding
like a nightmare where you must swim in antlr internals.  Anyway a brain dump is a brain dump.

If the grammar is simple enough though I recommend hand rolling your own parser making it
non-blocking (storing state in between chunks fed to it).  This is a painful task and the
code will be ugly and filled with nastiness.  Sorry, at this point I have no better alternative
in mind.

Hope this helps!

View raw message