xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [spinnaker] Announce
Date Mon, 10 Jul 2000 19:14:31 GMT
Arnaud Le Hors wrote:
> Now looking at a few specific points.
> If this has been seen as a Sun vs IBM war it can only be because of the
> way James first presented this new project. I quote:
> James Duncan Davidson wrote:
> > After quite a bit of
> > discussion, the rest of the XML team at Sun, the people who are responsible
> > for the parser that will ship in the core of future JDKs, agree as well. It
> > is important to stress that we want to ship an Apache based parser in the
> > JDK for all the reasons that you'd expect.
> It is crystal clear that this is all about Sun, not James as individual,
> despite his later claims. Just a fact.

I already wrote that I was surprised to see this written by James since
I already knew his intentions and we have been discussing about this

In fact, I would not have replied to this thread if was not for this
simple fact.
> Stefano Mazzochi wrote:
> > > So far the only proof I've got is that Hotspot miserably fails on
> > > Xerces. This means to me that Hotspot has a problem, not xerces.
> >
> > Bullshit.
> >
> > I used to optmimize x86 assembly code by hand for Pentium I dual
> > pipeline, then worked great for Pentium I but failed miserably with
> > Pentium II machines compared to what the C compiler produced
> > automatically.
> Did your code work slower on the Pentium II than on the Pentium I or
> x86? I doubt it.

No, of course not.

> It's also my experience that in the long run your code
> works faster when you let the compiler do the optimizations. But it's
> beside the point here. What we are seeing with Hotspot is that it is
> slower than a regular VM (with a JIT). Profiling reveals that it fails
> to compile most of the code and runs most of it in interpreted mode. Not
> only that, but it performs better when profiling is on!! If this isn't a
> proof that something is really wrong in Hotspot I don't know what is!
> Besides, if anyone (and people from Sun in particular since I expect
> them to be the best experts in the matter) would help improving Xerces
> performance over Hotspot it would be great!!!

To optimize 486 asm I used unrolled loops a lot. Then the pentium came
in and had jump prediction logic: it's sort of the very beginning of
hotspot heuristics. Right there, loop unrolling became "bad practice"
because it increased memory usage, cost of ownership without giving
particular benefits (in fact, larger L1 caches meant optimized execution
for tight loops)

The optimized 486 code was most of the time _slower_ under P1 than
C-compiled code.

I think you are experiencing the exact same problem on a different
scale. Jump prediction for x86 asm is, today, hotspot heuristics for
Java bytecode. Much more complex, but same impact on optimized code.

This said, I totally agree with you that having collaboration between
the Hotspot engineers and the Xerces projects to "know what's going on"
would make an incredible difference.

Whether or not this needs the creation of a new codebase from scratch or
not, I cannot say.
> Stefano Mazzochi wrote:
> > > >     * However, because Xerces was heavily pre-optimized, its
> > > >       extremely complex to understand and delve into. I think
> > > >       that this is best reflected in that most of the bits that
> > > >       go into Xerces come from IBM Cupertino.
> > >
> > > Not so. What you're refering to as "IBM Cupertino" is hardly a fixed set
> > > of people. We've actually had a lot of turnover and we keep getting new
> > > people involved in this project all the time. This hasn't prevented any
> > > of them to contribute significantly. The only reason most bits come from
> > > IBM is that nobody else has comitted as many resources to this project.
> >
> > I don't give a shit about who is paying whom to do anything as long as
> > what is being done is good for me. I'm happy with Xerces and I use it.
> > But if somebody is not, they have all the rights in the world to do
> > something about it and if the community is not open enough to listen, to
> > prove their points by creating new code and let the community decide.
> >
> > This is _NOT_ a IBM vs. Sun thing and I suggest everyone on this list to
> > ignore any post that go in that direction.
> You're completely missing the point. My point is that James argument
> that the fact that most bits come from IBM Cupertino proves that Xerces
> is too complex is bogus. This has nothing to do with IBM vs Sun. Read
> what I write not what you want to read.

Sorry if I missed your point.
> One last point, this is hardly a matter of misplaced sensitivity. None
> of the authors of the original code are still involved in this project.
> Most of us have only been involved in this for a few months. I just
> think there is much more value in helping to improve the current code
> than starting a competing project in parallel. For one thing, I don't
> think this project can afford wasting resources. Andy Clark, Jeff
> Rodriguez, and Eric Ye are struggling to finish up the implementation of
> XML Schemas. I'm sure they wouldn't mind any help.
> This is not to say that it's wrong to make experiments on the side. This
> is to me a natural way of making progress. We do that all the time. As a
> matter of fact I have three checkouts of the xerces source tree in which
> I keep experimenting various ideas. In one of them I've even merged some
> of the code from the crimson DOM into xerces DOM.
> It's just that I have more faith in evolution than revolution.

I due too, trust me. I hate revolutions. They are the most egocentric
way to express your visions and create lots of friction and might remove
some momentum to the project. And I know this because I did my own.

I would not have the energy to fight another revolution. In fact, I
tried to make the projects I help in open enough to avoid such friction.

Like I said, if a revolution happens it's because the project
development community has problems. These problems cannot be fixed by
simply saying: you guys are not open enough.

You'd get pissed. You can't see your face without a mirror. A revolution
is such mirror.

All Tomcat, Xerces and Xalan didn't start as Apache projects and their
development community was _imposed_ and did not emerge from the
community of volunteers.

For Tomcat, the Sun people were working in the open like they normally
did before the deal: they just had to show their results on a public CVS
every day.

Then some people got really frustrated by that and started to take over,
cleaning up, doing things, answering questions, volunteering for release
coordination, blah, blah. That guy was Sam Ruby from IBM. That guy
started distributed development for the tomcat team: they could not have
a morning meeting to decide something, they needed to contact Sam, then

In both Xerces and Xalan, this hasn't (yet) happened.

James is trying (the wrong way? I can't tell) to do this: unlock the
development community.

Careful: not that you guys are doing this "on purpose", no, of course
not. It's just that way and Xerces is _too_good_ to generate such itch
to scratch.

Yes, people, it takes a long time to understand why good code does more
harm than crappy code to open source. A bug-less software is used, a
buggy software is worked on and fixed, but by doing so, the development
community grows.

I'm not saying you should break Xerces on purpose, no, it's just that
you must start off a new codebase to make people help you.

This is why I told Scott to release Xalan 2.0 as early as it compiles:
it would create the development community that he's been asking for for
so long.

James is just trying to do the same: start a new codebase to create the
itches to scratch.

Is it wrong to give it another codename? probably. 
Is it wrong to place it in another CVS module? probably. 
Is it wrong to go off without asking first? probably.

So, let me start off another option:

1) we forget about spinnaker
2) we create a new CVS branch under xml-xerces where Xerces2 should
3) in case, we create a xerces2-dev mail list

What do you think?

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<stefano@apache.org>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message