lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: (LUCENE-1844) Speed up junit tests
Date Fri, 27 Nov 2009 18:27:31 GMT
On Fri, Nov 27, 2009 at 10:52 AM, Erick Erickson
<> wrote:
> But then I got to thinking..... I admit I've only scratched the
> surface of the JUnit4 parallelization stuff. That said, it
> seems like the real benefit comes from making use of
> multiple cores, we don't get huge speedups just from
> running multiple threads at once on a single core. Which
> makes sense if you're not doing much in the way of I/O.

Right, it's the multi-core machines that gain the most from this.

> This notion was inspired by the "scary Python script"
> comment.....
> So what if we use Ant ForEach construct instead? Yet
> again this is a fuzzy idea I'm throwing out without much
> to back it up. Mostly I'm wondering if anyone's thought about
> it before or can shoot it down before it takes wing. Or if
> it is worth exploring.
> Assuming we structure our test directories so there are only
> directories at the root of the test area, could we persuade Ant
> to fire off the tests N directories at a time in parallel?
> N would default to 1 but could be passed in to the task, something
> like -DmaxThreads=4. ForEach actually has a maxThreads
> parameter..... In fact, we wouldn't even need to have only directories
> at the test root, but the individual test files at the root would probably
> be inefficiently run.
> I suspect that keeping the test directories in balance would be
> much less work that trying to parallelize using JUnit4, and be
> much less fraught with gremlins. This assumes we get
> sufficient isolation by Ant running separate threads, about
> which I have absolutely NO information. Like I said, mostly
> I'm wondering if anybody's gone down this path before and
> has wisdom to offer.

I think this rough idea is a good approach, though I don't know much
about ant's ForEach.

One thing the scary Python script does is divide up index & search
packages into 2 parts ("a" and "b"), by breaking up the tests
according to 1st letter.  We might be able to take a similar approach,
so that we're not forced to unnaturally separate tests into subdirs?

The entire index or search package was too slow to run otherwise (ie,
I needed to throw concurrency at it).

> Which *still* doesn't mean we shouldn't do whatever we can
> to speed up individual tests, but looking that the timings there's
> no obvious low-hanging fruit....

Yup.  It's definitely an ongoing thing too...

> I wonder if we could somehow run the various directories in
> time order, longest-to-shortest in the hope that all the threads
> would finish up "close enough" to the same time. I haven't
> thought about *how* to make this happen yet though....

This is very important -- I do the same thing in the python script.

Also, will ant's ForEach take a set of say 30 things to work on, and
take the # threads to use, and just pull from that queue of 30, in

> Anyway, I'll be happy to pursue this if y'all think it has merit,
> let me know and I'll open a JIRA and take it on. For the
> benefit of those aforementioned *real* people with *real*
> machines, who I'll rely upon to help test this notion....
> Is the poor-mans version of this on a dual-core machine
> just running "test-core" and "test-contrib" in two separate
> windows?

I think you could, except, I think they share sub-tasks (eg,
"compile-core") so the two will sometimes stomp on each other.

The scary python script first uses a single thread to compile
everything, then runs N threads pulling from the queue.  BUT: I apply
a temporary patch to the ant build files, so that the N threads do not
try to, eg, compile-core or jar-core, separately.

Also one thing I'd love to try is NOT forking the JVM for each test
(fork="no" in the junit task).  I wonder how much time that'd buy...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message