lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: (LUCENE-1844) Speed up junit tests
Date Fri, 27 Nov 2009 20:38:30 GMT
<<<Also, will ant's ForEach take a set of say 30 things to work on, and
take the # threads to use, and just pull from that queue of 30, in
order?>>>

That's the implication I took from here:
http://ant-contrib.sourceforge.net/tasks/tasks/index.html

Ignorance is bliss, I didn't find the ForEach by looking at Ant
documentation, but by googling "ant parallel". Turns out this
is in Contrib. I don't even know if it's current.

Tell ya' what. I'll take a quick whack at it. I'm a believer
in prototyping if at all possible. So I'll create a really stupid
implementation of this with a hard-coded list of tests to run
and see what happens. If it works for me, I'll pass it along
to whoever wants to give it a spin and we'll get a clue whether
it provides enough of an improvement to pursue seriously.

I'll open a JIRA since at least Mike and I seem to be interested....

Erick

On Fri, Nov 27, 2009 at 1:27 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Fri, Nov 27, 2009 at 10:52 AM, Erick Erickson
> <erickerickson@gmail.com> wrote:
> > But then I got to thinking..... I admit I've only scratched the
> > surface of the JUnit4 parallelization stuff. That said, it
> > seems like the real benefit comes from making use of
> > multiple cores, we don't get huge speedups just from
> > running multiple threads at once on a single core. Which
> > makes sense if you're not doing much in the way of I/O.
>
> Right, it's the multi-core machines that gain the most from this.
>
> > This notion was inspired by the "scary Python script"
> > comment.....
> >
> > So what if we use Ant ForEach construct instead? Yet
> > again this is a fuzzy idea I'm throwing out without much
> > to back it up. Mostly I'm wondering if anyone's thought about
> > it before or can shoot it down before it takes wing. Or if
> > it is worth exploring.
> >
> > Assuming we structure our test directories so there are only
> > directories at the root of the test area, could we persuade Ant
> > to fire off the tests N directories at a time in parallel?
> > N would default to 1 but could be passed in to the task, something
> > like -DmaxThreads=4. ForEach actually has a maxThreads
> > parameter..... In fact, we wouldn't even need to have only directories
> > at the test root, but the individual test files at the root would
> probably
> > be inefficiently run.
> >
> > I suspect that keeping the test directories in balance would be
> > much less work that trying to parallelize using JUnit4, and be
> > much less fraught with gremlins. This assumes we get
> > sufficient isolation by Ant running separate threads, about
> > which I have absolutely NO information. Like I said, mostly
> > I'm wondering if anybody's gone down this path before and
> > has wisdom to offer.
>
> I think this rough idea is a good approach, though I don't know much
> about ant's ForEach.
>
> One thing the scary Python script does is divide up index & search
> packages into 2 parts ("a" and "b"), by breaking up the tests
> according to 1st letter.  We might be able to take a similar approach,
> so that we're not forced to unnaturally separate tests into subdirs?
>
> The entire index or search package was too slow to run otherwise (ie,
> I needed to throw concurrency at it).
>
> > Which *still* doesn't mean we shouldn't do whatever we can
> > to speed up individual tests, but looking that the timings there's
> > no obvious low-hanging fruit....
>
> Yup.  It's definitely an ongoing thing too...
>
> > I wonder if we could somehow run the various directories in
> > time order, longest-to-shortest in the hope that all the threads
> > would finish up "close enough" to the same time. I haven't
> > thought about *how* to make this happen yet though....
>
> This is very important -- I do the same thing in the python script.
>
> Also, will ant's ForEach take a set of say 30 things to work on, and
> take the # threads to use, and just pull from that queue of 30, in
> order?
>
> > Anyway, I'll be happy to pursue this if y'all think it has merit,
> > let me know and I'll open a JIRA and take it on. For the
> > benefit of those aforementioned *real* people with *real*
> > machines, who I'll rely upon to help test this notion....
> >
> > Is the poor-mans version of this on a dual-core machine
> > just running "test-core" and "test-contrib" in two separate
> > windows?
>
> I think you could, except, I think they share sub-tasks (eg,
> "compile-core") so the two will sometimes stomp on each other.
>
> The scary python script first uses a single thread to compile
> everything, then runs N threads pulling from the queue.  BUT: I apply
> a temporary patch to the ant build files, so that the N threads do not
> try to, eg, compile-core or jar-core, separately.
>
> Also one thing I'd love to try is NOT forking the JVM for each test
> (fork="no" in the junit task).  I wonder how much time that'd buy...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message