geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Blewitt <>
Subject Re: Geronimo Deployment Descriptors -- and premature optimisation
Date Tue, 09 Sep 2003 18:53:52 GMT
On Tuesday, Sep 9, 2003, at 17:44 Europe/London, Jeremy Boynes wrote:

>> However, it doesn't necessarily mean that it can't generate the XML,
>> rather than a binary-compatible format that Jeremy was suggesting. An
>> XML document will always be more portable between versions than a
>> generated bunch of code, because when bugs are fixed in the latter you
>> have to regenerate, whereas with an XML file you don't.
> Please do not think I am thinking binary is the only way to go - that  
> notion
> was discarded back in EJB1.0 days. What I want is to have it as an  
> option.

Can I make a few observations here:

o Assumption: large XML files take long time to parse, therefore the  
server will be slow to start up
o Assumption: the way to solve that is with the deploy tool, and  
possibly a combined XML+binary format.

I think there are other solutions to the problem than just these.  
Whilst it is true the XML file parsing can take some time, it's not  
actually likely to be where the amount of time is taken up in the  
server. If we had metrics to prove it, I'd shut up, but we don't.

I'd postulate that we would be able to fire up the server faster if we  
used a different optimisations; for example, a multi-threaded startup  
(like provided by Avalon) instead of a single threaded model; an  
on-the-fly parse of the XML file instead of into a DOM/POJO; ditching  
the JMX later and using Java method calls; and so on.

But we don't *know* that this is where the bottleneck is. It may be,  
and we can run tests to show that in a simple scenario, option A is  
faster than option B, but that doesn't mean that that's where the  
bottleneck will be in the server.

But if it takes (say) 10 or 100 times as long to dynamically create the  
bean, we are solving the wrong problem. Don't get me wrong, I don't  
know how much time it takes to create a bean -- but we don't seem to  
have any profiling to suggest the various options. It could even be the  
case that a more optimised XML parser would solve the problem, or a  
different way of creating the POJOs.

I'd also like to disagree that this optimisation should be done by the  
deployer. Why not have it done by the server when the code is deployed?  
Sure, you wouldn't want it to happen every time the server starts (like  
compiling JSPs) -- so dump out a binary representation at the server  
side, and drop that cache when the application gets redeployed. That  
way, you still get the fast startup (2nd time onwards) whilst  
maintaining portability and without having to sacrifice any issues with  
the developer.

> For example, parsing the XML with
> full schema validation is a dog - on my machine even a simple file  
> takes a
> couple of seconds and a couple of MB and I am concerned about a) large
> applications with hundreds of modules taking forever to start, and b)
> applications trying to run with constrained resources. And yes, we do  
> need
> to consider these things :-)

But if you had that large an application, how long would you expect it  
to take up? Realistically, what is the largest size of app you've had  
to deal with? Most web-apps have just a single servlet these days (ala  
Struts), so the only issue is with EJBs, and with 1000 EJBs you're  
still looking at 1k of data/EJB to make a 1MB file. That's a hell of a  
lot. And do we know how long it takes to deploy 1000 EJBs once the XML  
file has loaded? Are we seriously saying that we expect that part of  
the process to take dramatically less than 2s? If not, then the  
bottleneck isn't going to be at the XML parsing stage.

> We have also had proposals for storing configuration information in  
> respositories and relational databases, neither of which would allow
> vi-style access to the XML. A binary format may well be a better  
> option for
> them.

IMHO I don't think that a 'vi' style access for XML is the sole reason  
to use them. I am personally more a fan of storing the configuration in  
LDAP, which will be slower still than having it in XML files. But I  
wanted to raise a big 'no' to a binary file format, including any  
serialized concepts of MBeans which would then have real difficulty in  
being interpreted if we ever managed to break away from JMX. No, I  
don't think it will happen soon, but I can hope :-) See Elliotte's  
comments on XML and binary at (or the  
cached version at 
50.html+%22Compress+if+space+is+a+problem%22&hl=en&ie=UTF-8 since I  
couldnt' see it on the former)

> Think of it like JSP: some people want to pre-compile, and this is  
> *very*
> common in production environments.

I don't see the two being that comparable. A site may have many  
hundereds of JSPs with several k of data in them each, and they take  
(relatively speaking) a long time to parse, translate, and then  
compile. I don't see that parsing an EJB-JAR.xml file in the same order  
of magnitude.

I don't disagree that we can cache an internal form to optimise  
speedup; I just don't think it should be anything the deployment tool  
should use. Same with JSPs; we can upload them into Geronimo, and then  
a background process can pre-compile them when resources are available.  
I don't think we should force the developer to decide between the two.  
[What other JSP engines get wrong is that it's necessary to precompile  
all JSPs before deployment. It's not; they just need to be compiled  
before the user sees them. The process should be Deploy -> run app ->  
precompile all possible next JSPs that you can move to.]

Premature optimisation is the root of all evil.


View raw message