maven-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Rosenvold <>
Subject Re: Parallel classloading, need review...
Date Fri, 09 Aug 2013 18:45:57 GMT
I think we have to be quite smart to get any gains.

Once a plugin class starts loading in the classloader, the first class
referenced "A", will most likely load a whole tree. Sometimes a huge
tree. If we recorded the class loading order when building the plugin
descriptor, we could preemptively load all the classes in all the
plugins before the actual execution starts. This might have the
surefire classes being loaded in a separate thread while the compiler
plugin is running. Alternately we could just record it for each
execution and just read the stored data from the previous execution
(this was the first POC I am thinking of making)

If we were able to build the actual tree of loading the classes, we
could go bottom-up (leaf first) in a separate thread, which might also
give results.

With the current patch (which just loosens synchronization on plexus
class orld), when we run parallel maven -T 4, it is safe to assume
that the population of  the plugin classloader happens in the same
order for all 4 threads. So basically one of the threads will be
leading the pack, holding the locks on the entire class tree being
loaded. No luck, and very little performance gain; unless one of the
threads decides to go down a different path of execution that changes
classloading order - in which case you might gain something.


2013/8/9 Romain Manni-Bucau <>:
> Yeah the main issues were
> 1) you need to // the whole process
> 2) your hard drive needs to be parallized (didnt find a free solution to
> this one)
> 3) you need to load independent classes
> So generally gain is not as impressive as parallelization sounds
> Le 9 août 2013 19:52, "Kristian Rosenvold" <> a
> écrit :
>> 2013/8/9 Romain Manni-Bucau <>:
>> > When i tested on tomee gain was ridiculous too so maybe not the first
>> place
>> > to hack on to make maven fast ;)
>> > Le 9 août 2013 18:36, "Jason van Zyl" <> a écrit :
>> >> And what's the net difference then before after trying to parallelize
>> the
>> >> classloading? I'll read up on the Java7 classloading this weekend.
>> I think this really depends on how we're able to exploit it. Our
>> domain is partitioned into lots of small classloaders, so there should
>> be a bit of potential. How did you try to partition your classloading
>> in tomee ? From what I've seen of "asm" performance, class loading is
>> mostly IO.
>> Within a single classloader I think you'd need some kind of
>> preemptive/recording based strategy. Implementing that in the
>> classRealm class in classworlds should be almost trivial, and unless
>> someone beats me too it, I'll do that over a few glasses of red wine
>> some time. (Record class loading order from one invocation and re-use
>> in another).
>> Parallel construction of multiple classloaders should have some potential
>> As for "making maven fast", well that's a topic I've spent
>> considerable time & energy on.
>> Apart from class loading, pom loading, pom merging and artifact
>> resolutions are basically the computationally intensive parts of the
>> maven core. Class loading and artifact resolution are the big ones;
>> the atctual XML parsing/merging is really not that much.
>> Most of the inefficiencies are in plugins. And sometimes there's
>> inefficiencies related to layering. An example of this is
>> maven-install-plugin; it uses maven core to install (copy) the jar
>> file into the local repository, but then it re-reads the file to
>> calculate SHA1/MD5 checksums. Until recently it atually read the files
>> 3 times, I just reduced that to 2 times.
>> I have been profiling the heck out of a bunch of builds, and the big
>> stuff is in the plugins. For maven core I think it is safe to say
>> you'll need to look for algorithmic improvements to gain anything
>> significant; stuff like requesting a bunch of artifacts from the
>> remote repository in one HTTP request comes to mind. One could work on
>> parallelizing classloading, which should be doable. Other than that
>> there's not much left.
>> As a theory for my really long runs in the woods, I consider
>> parallelizing the entire pom loading, interpolation and artifact
>> resolution process. Unfortunately the massive amount of mutable state
>> within the maven model and the maven core makes this infeasible.
>> Simply put; the availablity of setters all over the place allows the
>> construction of models/data to decay to spaghetti. Such spaghetti also
>> creates wasted computation, since the same values are recalculated
>> repeatedly. It also hinders parallelization. Maven core has its share
>> of such spaghetti. On my last long run in the woods I contemplated
>> writing another totally immutable layer of objects beneath the current
>> objects and simply transfer all the state to the current model objects
>> when done. But we're looking at quite a tremendous effort to catch
>> that last second of wasted computation - better spend that energy
>> optimizing plugins :)
>> On the non-radical front, parallel classloading is probably the last
>> "simple" thing that can be optimized in core.
>> For multi-module builds there's the potential of re-using state/data
>> computed in one module for the next. Surefire could conceivably keep
>> the forked process alive between modules if the classpath is only
>> expanded in the next run. Or surefire could run an additional
>> invocation early in the lifecycle and start the forked VM while the
>> compiler plugin is running (if it forks, which it can decide early);
>> although the actual .class files may not be available, it knows
>> everything it needs to know.
>> Kristian
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message