mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Tests running time
Date Sun, 11 Dec 2011 19:35:24 GMT
The right way to handle this is to have instances get a random number
generator that works like it should.  Magic resets in the middle of
operation are not a good idea.

I think we need a better way to inject generators that doesn't involve
statics.

On Sun, Dec 11, 2011 at 6:24 AM, Sean Owen <srowen@gmail.com> wrote:

> Yes that's exactly what's happening -- not why the tests aren't running
> fast, but why running them in parallel in one JVM results in
> non-deterministic results.
>
> If by "not use statics" you mean hold a static reference to a Random in
> client code, yes, that could help, except that you'd also have to not share
> objects within a test. And of course it's possible a random value affects
> initialization of another, shared static data structure. And, making a
> Random non-static may be infeasible: imagine a class which is instantiated
> a billion times and shares one RNG; giving each instance its own RNG would
> dramatically increase memory use and init time. And you would never be able
> all non-determinism: consider third-party code we use. I don't know if it's
> feasible unfortunately.
>
> The point of resetting the RNGs is to (try to) make the tests repeatable.
> If a test fails 1 in 1000 times due to some sequence of random numbers and
> can't be reproduced we have a hard time tracking it down. You have to reset
> potentially every RNG to a known state, not just ones that are instantiated
> later, or else the purpose is defeated.
>
> I think it's coincidence that it passes with a different RNG set-up; the
> test expects the RNG to not be reset in the middle (as it works in serial
> execution only) and in parallel execution, when the reset is turned off, it
> doesn't. Of course it could fall the other way -- parallel execution causes
> some shared RNG to generate a different sequence, and that's what in theory
> we're trying to dodge, since it's not repeatable.
>
>
> On Sun, Dec 11, 2011 at 1:08 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
>
> > In working through what I _think_ will be the primary viable way to make
> > this stuff faster (parallel execution, fork once) it appears to me that
> the
> > primary concurrency issue is due to how we initialize the test seed and
> the
> > fact that we loop over all RandomWrapper objects and reset them.  So,
> it's
> > likely the case that in mid stream of some of the tests, the RNG is
> getting
> > reset by other calls to the static useTestSeed() method.
> >
> > Of course, there might be other concurrency issues beyond that, but this
> > seems like the most likely one to start.  Thus, the question is how to
> fix
> > it.  The obvious one, I suppose, is to not use statics for this stuff.
> >  Another is, to perhaps, use a system property (-DuseTestSeed=true and/or
> > -DuseSeed=<SEED>, the latter being useful for debugging other things)
> that
> > is set upon invocation in the test plugin, but has the downside that it
> > would also need to be set when running from an IDE.
> >
> > And, to Sean's point below, it seems that we may have some test
> > dependencies on the specific set of random numbers and the outcomes they
> > produce.
> >
> > Thoughts?  Other ideas?
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message