lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Parallel tests in Benchmark
Date Sat, 03 Apr 2010 13:26:25 GMT
Ok let's do that (add runsequential to benchmark and all the rest). If
I'll run into this elsewhere as well I will report and we can talk
then about trying to find a solution for this. If it's just benchmark
then I think we'll be ok.

Shai

On Thursday, April 1, 2010, Robert Muir <rcmuir@gmail.com> wrote:
> On Thu, Apr 1, 2010 at 12:03 AM, Shai Erera <serera@gmail.com> wrote:
>
>
> Hi
>
> I'd like to summarize a discussion I had w/ Robert and Mike last night on IRC, about
the parallelism of tasks in Benchmark:
>
> For some reason, ever since parallel tasks were introduced, when I run 'ant test' from
the contrib/benchmark folder (or the root), the tests just hang at some point, after WriteLineDocTaskTest
finishes. What's very weird is that it seems I'm the only one experiencing this, and so for
a long time I thought it's just a problem w/ my environment ... until yesterday when I did
a fresh checkout of trunk, to a fresh folder and project, and still the tests stuck.
>
> Thread dump does not show anything relevant to Lucene code, but rather to Ant. The main
thread is waiting on org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on org/apache/tools/ant/taskdefs/Execute.waitFor
and two other on java/io/FileInputStream.read. But nothing is related to Lucene code, directly.
Also annoyingly, but conveniently for debugging that issue, it happens very consistently on
my machine - sometimes the test passes, but 90% hangs.
> Running w/ -Drunsequential=1 consistently succeeds.
>
> We've explored different ways to understand the cause of the problem, and came across
several improvements and a workaround, but unfortunately not to a definite resolution:
>
> * As a last resort, we can add runsequential property to benchmark build.xml, which forces
Benchmark tests to run sequentially. Since that's a tiny package which takes a few seconds
to run anyway, and parallelism doesn't improve much (it actually runs slower, when it passes,
on my machine: parallel=15 sec, seq=11 sec), this might be acceptable.
>
> * Moving the junit temp files (such as that flag file) created to the temp directory
each test uses. This is actually a good thing to do anyway (thanks Robert for spotting that),
because it avoids accidental commits of such files :), as well as doesn't clutter the main
environment. We've done that because when I hit CTR:+C to stop one of the runs which hung,
we received a FNFE on a junit flag "file is being accessed by another process" (something
like that), and thought this is related to the hangs I'm seeing. Anyway, this file is attempted
access by multiple JVMs concurrently, which seems bad.
>
> * Explore the JUnit Formatter code under src/test, since it uses file locking. I've disabled
locks (using NoLockFactory), however the test still hung.
>
> * Change common-build.xml threadsPerProcessor to '1' instead of '2'. We think that might
be a good thing to do anyway - if people run on machines with just one CPU, threading is not
expected to help much, as opposed to running on multiple CPUs. But we don't want to enforce
it on anyone, so we think to change the default to '1', but introduce a property 'threadsPerProcessor'
which users will be able to set explicitly.
> ** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad W500), the
test consistently passes - it just doesn't like the value '2'. At least it passed as long
as I ran it, maybe a thread hang is lurking for me around the corner somewhere.
>
> * We made sure the benchmark tests indeed read/write the test data files from/to unique
directories. But like I said - there is no hang in Lucene code reported in the thread dump.
>
> It was very late last night when we stopped, and my eyes were tired, so I didn't summarize
it right away. Robert, I hope I've captured everything we did, if not please add.
>
> Anyone's got any suggestions? It's unfortunate that I'm the only one running into this
problem, because whatever the suggestions are, you'll probably need me to confirm them :).
And I'm going away for 3 days (camping - no internet ... well at least no laptop :)), so unless
someone has a suggestion within the coming few hours, we can continue that when I get back.
>
> Shai
>
>
> I think you got everything. I reopened the JIRA issue too (LUCENE-1709) and listed the
things we can do for sure now, such as lowering threadsPerProcessor (and allowing someone
to use a system property to override this) and fixing junit temp files to be in the temp directory.
Additionally I would like to fix the ant library problem as mentioned there. it works great
from the command-line but we should improve this for IDE-users, so they do not see a compile
error.
>
> I am personally for the idea of adding the runsequential property to benchmark's build.xml,
to force it to run serially. While I am unable to reproduce your problem, it does not surprise
me, as I had a tough time trying to prevent benchmark tests from stepping on each others toes.
>
> --
> Robert Muir
> rcmuir@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message