Mailing-List: contact dev-help@commons.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Commons Developers List" <dev@commons.apache.org>
Received-SPF: pass (athena.apache.org: domain of phil.steitz@gmail.com
 designates 74.125.83.171 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:user-agent:mime-version:to:subject:references
         :in-reply-to:content-type:content-transfer-encoding;
        b=otU3XkkOpvwh8ZBjqz59a8buqPGfq3DRd9tfXZvB8lXyBTEFPCx9ZrfMU0lUCjxaUz
         1tQGheVAyJ6xzpi4RHYkhrmMcz62c1a4qrbvsZDVnaVJzb2imUbWMljOsAqF6DdXeFCY
         0hXF3AbV2mbFzzUrO0iYdQKLhI2f2F+2UlD84=
Message-ID: <4DD2CBEF.1090803@gmail.com>
Date: Tue, 17 May 2011 12:26:39 -0700
From: Phil Steitz <phil.steitz@gmail.com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US;
 rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: Commons Developers List <dev@commons.apache.org>
Subject: Re: [math] [GUMP@vmgump]: Project commons-math (in module
 apache-commons)
 failed
References: 
 <2113281926.1952921305620535208.JavaMail.root@spooler6-g27.priv.proxad.net>
In-Reply-To: 
 <2113281926.1952921305620535208.JavaMail.root@spooler6-g27.priv.proxad.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On 5/17/11 1:22 AM, luc.maisonobe@free.fr wrote:
> ----- "Phil Steitz" <phil.steitz@gmail.com> a écrit :
>
>> On 5/16/11 3:47 PM, Gilles Sadowski wrote:
>>> On Mon, May 16, 2011 at 02:39:01PM -0700, Phil Steitz wrote:
>>>> On 5/16/11 3:44 AM, Dr. Dietmar Wolz wrote:
>>>>> Nikolaus Hansen, Luc and me discussed this issue in Toulouse.
>>> Reading that, I've been assuming that...
>>>
>>>>> We have two options to handle this kind of failure in tests of
>> stochastic
>>>>> optimization algorithms:
>>>>> 1) fixed random seed - but this reduces  the value of the test 
>>>>> 2) Using the RetryRunner - preferred solution
>>>>>
>>>>> @Retry(3) should be sufficient for all tests.
>>>>>
>>>> The problem with that is that it is really equivalent to just
>>>> reducing the sensitivity of the test to sensitivity^3 (if, e.g,
>> the
>>>> test will pick up anomalies with stochastic probability of less
>> than
>>>> alpha as is, making it retry three times really just reduces that
>>>> sensitivity to alpha^3). 

This (my statement above) is not quite correct, or at least whether
it is correct or not depends on the problem.  While the failure
probabilities may be the same for three retries vs. one with lower
sensitivity, the results mean different things and the first is
generally more likely to indicate a change-related problem.  Sorry
for this mistake.

>>>>  I think the right answer here is to find
>>>> out why the test is failing with higher than, say .001 probability
>>>> and fix the underlying problem.  If the test itself is too
>>>> sensitive, then we should fix that.  Then switch to a fixed seed
>> for
>>>> the released code, reverting to random seeding when the code is
>>>> under development.
>>> ... they had settled on the best approach for the class at hand.
>> Whatever rationale was discussed should be summarized here, on the
>> public list.
> We did not looked at the code itself when we met, but rather spoke about stochastic
> tests at large. Nikolaus said using an optimization algorithm as a black box is
> clearly not a good thing, Dietmar said stochastics tests are useful and may fail
> sometimes, and I said unit tests in a continuous integration process are needed
> and should not fail randomly. All these statements are true I think, they only differ
> as they look at the problem from a different point of view. It was basically the
> same thing we already said on the list some months ago about the statistics
> tests, when we finally choose to set up a retry procedure (was it for Chi square or
> for Pascal distribution ?).

These have pretty much all been removed. I think the RandomDataImpl
tests are the only ones that still use retries.  I was planning to
remove those as well.
> There is unfortunately no perfect answer. We talked about both the fixed seed approach
> and the retry procedure, and Dietmar did not like the fixed seed, so we chose the other
> one.
>
> >From old memories, I think Ted proposed something different about generating random
> numbers that was used in Mahout. Ted, could you explain us again what you proposed ?

I won't speak for Ted, but IIRC, he was the first to advocate fixed
seeds.  After thinking more about the problem, I think the best
approach is random seeds during development changed to fixed prior
to release.

In the random data generation tests, we can state and control
precisely the probability that a test will fail randomly.  When
working on the code, its best to set this fairly low and use random
seeds.  Generally, when you screw something up, failures will happen
consistently.  Even with a fixed seed and p(false positive) = .0001,
the tests fail pretty reliably when something is broken.  So the
approach above works well for these.  I guess the random seed +
retry approach will also in general work here; but this is going to
depend on the problem and it makes the sensitivities harder to set
and understand in general.  And while it drives down the probability
of spurious failure, it does not eliminate it.

Do we have any way to bound or estimate the expected probability
failure of the optimization tests?  Can we relate these estimates to
expected errors in returned values?   What bothers me about just
setting a retry number is that a) there may be an underlying problem
that is being masked and b) if there is any way that we can estimate
the likelihood of bad results, we should document that.

Phil
>>> [I.e. we had raised the possibility that there could a bug in the
>> code that
>>> triggered test failures, but IIUC they now concluded that the code
>> is fine
>>> and that failures are expected to happen sometimes.]
>> I would like to understand better why that is the case.  If failures
>> happen sometimes in test, does that means that bad results are
>> expected to be returned sometimes?  If so, have we documented that?
>>
>>> It still seems strange that it is always the same 2 tests that
>> fail.
>>> Is there an explanation to this behaviour, that we might add as a
>> comment
>>> in the test code?
>> I agree here, and possibly in the javadoc for the application code. 
>> If the code is prone to generating spurious results sometimes, we
>> need to make that clear in the javadoc.
> It really depends on the function you optimize, with or without local
> minima. Perhaps this test case is for a known difficult problem, I didn't
> look at this.
>
> Luc
>
>> Phil
>>> Gilles
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org