Mailing-List: contact dev-help@commons.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Commons Developers List" <dev@commons.apache.org>
Received-SPF: neutral (nike.apache.org: local policy)
Message-ID: <50B90E96.5040606@free.fr>
Date: Fri, 30 Nov 2012 20:52:54 +0100
From: Luc Maisonobe <Luc.Maisonobe@free.fr>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:10.0.10) Gecko/20121027 Icedove/10.0.10
MIME-Version: 1.0
To: Commons Developers List <dev@commons.apache.org>
Subject: [math] differentiation framework
References: <20121130161135.GG20488@dusk.harfang.homelinux.org>
 <245DF7F3-1003-4A47-BFF3-6877A64F38D8@gmail.com> <50B8F24E.1060104@free.fr>
 <20121130190625.GB3397@dusk.harfang.homelinux.org>
In-Reply-To: <20121130190625.GB3397@dusk.harfang.homelinux.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit

[changing the subject to something different for this sub-thread]

Le 30/11/2012 20:06, Gilles Sadowski a �crit :
> Hi Luc.

Hi Gilles,

> 
>>> As a user of the optimization algorithms I am completely confused by
>>> the change. It seems different from how optimization function are
>>> typically used and seems to be creating a barrier for no reason.
>>
>> The reason is that the framework has been done for several uses, not
>> only optimization.
> 
> Let's admit that the "autodiff" subject is not yet well-known, and viewed
> from the CM "optimization" package it looks very complex: some examples
> would be welcome in the user guide.
> The "DerivativeStructure" itself is not easily grasped (e.g. there are
> many constructors and one does not know which are "user-oriented" and which
> are more linked to the internal structure ("DSCompiler" which is even more
> cryptic at first glance).

I fully agree.

> 
> Of course, _I_ just have to start reading about the subject in order to
> understand; you are not expected to provide the background within the
> Javadoc! :-)

If you want some background, read the paper referenced in the API. It is
really a good paper.

> 
>>>
>>> I am not clear why you can't just leave the standard interface to an
>>> optimizer be a function that computes the value and the Jacobian (in
>>> case of least-squares), the gradient (for quasi-Newton methods) and
>>> if you actually have a full newton method, also the Hessian.
>>>
>>> If the user wants to compute the Jacobian (gradient) using finite
>>> differences they can do it themselves, or wrap it into a class that
>>> you can provide them that will compute finite differences using the
>>> desired algorithm.
>>
>> This is already what many people do, and it can be done with both the
>> older and the newer API. Nothing prevents users to use finite
>> differences in the objects they pass to the optimizer.
>>
>>>
>>> Also I can image a case when computation of the Jacobian can be sped
>>> up if the function value is known, yet if you have two separate
>>> functions handle the derivatives and the actual function value. For
>>> example f^2(x). You can probably derive some kind of caching scheme,
>>> but still.
>>>
>>> Maybe I am missing something, but I spend about an hour trying to
>>> figure out how change my code to adapt to your new framework. Still
>>> haven't figured it out.
>>
>> I can easily understand. It is really new and needs some polishing and
>> documenting. I am sorry for that.
> 
> The implementation was a huge work already. Thanks fo that.

You're welcome.

> 
>>
>> In your case, if you already have two different functions, you can merge
>> them to create a MultivariateDifferentiableVectorFunction and pass this
>> to the optimizer. See how
>> FunctionUtils.toMultivariateDifferentiableVectorFunctiontoMultivariateDifferentiableVectorFunction
>> does it, starting from a DifferentiableMultivariateVectorFunction.
>>
>> See below about the deprecation of this converter method, though.
>>
>> Note that the new API is simply another way to represent the same
>> information. The former way was limited to first derivatives and was
>> really awkward when multiple dimensions were involved (as you derive
>> with respect to several variables, a real function becomes a vector
>> function (a gradient), a vector function becomes a matrix function and
>> it becomes quickly untrackable. If you start thinking about second
>> derivative, it is worse. It was also difficult when you combine
>> functions, for example if you compute f(u), it looks like a univariate
>> function, but if you see that u = g(x, y, z), it is really a
>> multivariate function. When computing differentials, you have some
>> problems pushing all partial differentials du/dx, du/dy, du/dz to the
>> df/du function, there is a bottleneck. The only solution people had was
>> to do the composition outside by themselves. The new API avoids that, it
>> doesn care if u is a simple canonical variable or by itself a function
>> from some former computations.
>>
>> For information, the original proposal from July about the
>> differentiation framework is here:
>> <http://mail-archives.apache.org/mod_mbox/commons-dev/201207.mbox/%3C500C2BBF.2020806%40spaceroots.org%3E>.
>>
>>>
>>> On Nov 30, 2012, at 11:11 AM, Gilles Sadowski
>>> <gilles@harfang.homelinux.org> wrote:
>>>
>>>> Hello.
>>>>
>>>> Context:
>>>> 1. A user application computes the Jacobian of a
>>>> multivariate vector function (the output of a simulation) using
>>>> finite differences.
>>>> 2. The covariance matrix is obtained from
>>>> "AbstractLeastSquaresOptimizer". In the new API, the Jacobian is
>>>> supposed to be "automatically" computed from the
>>>> "MultivariateDifferentiableVectorFunction" objective function.
>>>> 3. The converter from "toMultivariateDifferentiableVectorFunction"
>>>> to "MultivariateDifferentiableVectorFunction" (in "FunctionUtils")
>>>> is deprecated.
>>
>> It was both introduced in 3.1 and deprecated at the same time. It's not
>> the converter per se which is considered wrong, it is the
>> DifferentiableMultivariateVectorFunction interface which is deprecated,
>> hence the converter using this interface is deprecated as a consequence.
> 
> I know.
> 
> However, you suggested above that we look at the converter code in order to
> figure out how to switch to the new API. Of course, that's what I've done
> all day, which made me wonder: If users need to do the same over and over
> again, why not keep the code in CM indeed?

Fair enough.

> 
>>
>>>> 4. A "FiniteDifferencesDifferentiator" operator currently exists
>>>> but only for univariate functions.
>>>> Unles I'm mistaken, a direct extension to multiple variables won't
>>>> do:
>>>> * because the implementation uses the symmetric formula, but in
>>>> some cases (bounded parameter range), it will fail, and
>>>> * because of the floating point representation of real values, the 
>>>> delta for sampling the function should depend on the magnitude of 
>>>> of the parameter value around which the sampling is done whereas
>>>> the "stepSize" is constant in the implementation.
>>>>
>>>> Questions:
>>
>>>> 1. Shouldn't we keep the converters so that users can keep their
>>>> "home-made" (first-order) derivative computations? [Converters
>>>> exist for gradient of "DifferentiableMultivariateFunction" and
>>>> Jacobian of "DifferentiableMultivariateVectorFunction".]
>>
>> We could decide to not deprecate the older interface and hence not
>> deprecate these converters. We could also set up a converter that would
>> not use a DifferentiableMultivariateVectorFunction but could use instead
>> two parameters: one MultivariateVectorFunction and one
>> MultivariateMatrixFunction.
>>
>> I'm not sure about this. I would really like to get rid of the former
>> interface which is confusing as we introduced the new one.
> 
> IIUC, the "problem" or "barrier" is (as I hinted at in my previous post)
> that the "DerivativeStructure" does not bring anything when we just fill it
> with precomputed values. (is that correct?).

Almost, but not exactly. If you only uses the DerivativeStructure and do
not use it in further computation, then you are right: it can be seen as
a simple container, overly engineered. However DerivativeStructure are
very interesting when they are used as the *input* of further
computation, because in these computations you will get all derivatives
composition without complex development. Going back to one of my example
in this thread, consider v = f(u) and u = g(x, y, z). If g is really
really complex, you can use the provided finite differences wrapper to
have an approximate estimation of value u and all partial derivatives
du/dx, du/dy, d2u/dx2, d2u/dxdy, d2u/dy2 for example, all packed in one
DerivativeStructure variable. I g is simple, you could compute the
differentials by yourself and pack them into a DerivativeStructure,
which here would be used only as output and would look simply as a
container for all theses values. Suppose now that f is something
simpler, say f(u) = sqrt(1+exp(u^3-sin(u))). I could differentiate it
manually, but would fear to be asked to develop by myself the code to
compute df/dx, df/dy, d2f/dx2, d2f/dxdy, d2f/dy2! This is where
DerivativeStructure helps. Here, I would simply use the methods provided
by DerivativeStructure to do the computation, i.e I would simply compute
one sine, one cube, one subtraction ... and I would get all derivatives
magically out (not for free, since as Konstantin pointed out, the
computation is done underneath), but it certainly save development time
from the human who implements f.

> It just makes the interface very unfamiliar (using a complex structure)
> where "double[]" and "double[][]" were fairly straightforward to represent
> "multivariate" values and Jacobian, respectively. [Maybe this point can be
> alleviated by an extensive example in the user guide.]

In the other sub-thread, I will propose to remove this hurdle from
optimization framework completely.

> 
>>
>>>> 2. Is it worth creating the multivariate equivalent of the
>>>> univariate "FiniteDifferencesDifferentiator", assuming that higher
>>>> orders will rarely be used because of
>>>> * the loss of accuracy (as stated in the doc), and/or
>>>> * the sometimes prohibitively expensive number of evaluations of
>>>> the objective function? [1]
>>
>> Yes, I think it is worth doing, using separate step sizes for each
>> components (as they may have very different behaviour and units).
> 
> Indeed (see below).
> 
>>
>>>> 3. As current CM optimization algorithms need only the gradient or 
>>>> Jacobian, would it be sufficient to only provide a limited
>>>> (two-points first-order) finite differences operator (with the
>>>> possiblity to choose either "symmetric", "forward" or "backward"
>>>> formula for each parameter)?
>>
>> I'm not sure. As I explained above, the framework was not done with only
>> the optimizers in mind (they are an important use of differentials, but
>> are not the only one). This framework is also used on its own as a
>> [math] feature, directly from user code.
>>
>> The underlying computation for derivatives using finite differences can
>> be easily improved to use non-symmetric points, even for high order
>> derivatives. It is sufficient to compute the abscissae correctly by
>> changing the single line :
>>
>>   final double xi = t0 + stepSize * (i - 0.5 * (nbPoints - 1));
> 
> What about:
>   double xi = t0 * (1 + stepSize * (i - 0.5 * (nbPoints - 1)))
> ?
> [In order to automatically adapt to the accuracy loss due to floating
> point. I.e. the "stepSize" would be relative.]

It would not work when t0 = 0 and would look strange when called for t0
varying in sizes. In many cases in my work for example, t0 is really a
time offset in seconds from the beginning of a simulation, so I have
loops where t0 starts at 0 and ends at 1000000. Despite t0 order of
magnitude changes, the dynamics does not change and the evolution rate
near 0 is the same as it is near 1000000. In this case, having a
relative step size would lead to a decreasing accuracy as time flows.

> 
>>
>> Once the xi are computed, the remaining formulas are valid and they do
>> not suppose the xi are around the point (they may even not be regularly
>> spaced). The only constraint is that all xi are different from each other.
> 
> Personally, I'd be pleased to have more documentation in the code (around
> the part that mentions "differences diagonal arrays", "interpolation" and
> "monomial") or just a pointer to where that computation is explained.

I hear you...

There is never enough documentation.

> 
>>
>> So it would be possible to first add a enumerate : Formula with values
>> CENTERED, BACKWARD, FORWARD as a constructor parameter to the existing
>> univariate FiniteDifferencesDifferentiator and then to add another
>> differentirator for multivariate functions, using an array of stepsized
>> and an array of Formula enumerates so all components are computed
>> according to their constraints. I think we could use the same number of
>> points for all components, but it would also be possible to have a
>> different number of points if needed.
>>
>> Would you like me to do that?
> 
> I indeed tried to create the "multivariate" version, before being stuck in
> even more basic things.
> E.g. in trying to understand "DerivativeStructure", I went to the unit tests
> and found this ("testToMultivariateDifferentiableFunction() in
> "FunctionUtilsTest):
> -----
>     MultivariateDifferentiableFunction converted
>         = FunctionUtils.toMultivariateDifferentiableFunction(hypot);
>     for (double x = 0.1; x < 0.5; x += 0.01) {
>        for (double y = 0.1; y < 0.5; y += 0.01) {
>            DerivativeStructure[] t = new DerivativeStructure[] {
>                 new DerivativeStructure(3, 1, x, 1.0, 2.0, 3.0 ), // <--
>                 new DerivativeStructure(3, 1, y, 4.0, 5.0, 6.0 )  // <--
>            };
>            DerivativeStructure h = DerivativeStructure.hypot(t[0], t[1]);
>            Assert.assertEquals(h.getValue(),
>                                converted.value(t).getValue(), 1.0e-10);
>            Assert.assertEquals(h.getPartialDerivative(1, 0, 0),
>                                converted.value(t).getPartialDerivative(1, 0, 0),
>                                1.0e-10);
>            Assert.assertEquals(h.getPartialDerivative(0, 1, 0),
>                                converted.value(t).getPartialDerivative(0, 1, 0),
>                                1.0e-10);
>            Assert.assertEquals(h.getPartialDerivative(0, 0, 1),
>                                converted.value(t).getPartialDerivative(0, 0, 1),
>                                1.0e-10);
>         }
>     }
> -----
> I don't understand the lines which I marked with an arrow.

When I presented this framework to a co-worker of mine, he also told me
this was weird. It is.

> Why 3 variables, why that constructor (cf. remark above)? In fact, why is it
> not:
> -----
>      new DerivativeStructure(2, 1, 0, x),
>      new DerivativeStructure(2, 1, 1, y)

This test case was intended to stress out and detect wrong computations,
both in variables indices and in derivatives values, taking into account
the fact that the variables can themselves be functions of other variables.

So here, we have three unspecified variables (lets call them a, b and
c), and we are in the middle of a computation with two functions x(a, b,
c) and y(a, b, c), with x and y value being specified by the loops and
dx/da=1, dx/db=2, dx/dc=3, dy/da=4, dy/db=5, dy/dc=6.

The change you suggest above would simply have computed the partial
derivatives with respect to x and y, whereas the test does compute the
composed derivatives with respect to a, b and c.

> -----
> ? [And consequent changes in the calls to "getPartialDerivative(int...)".]
> 
> I didn't figure out why the test pass, or what should be result of the
> computation: "h" does not contain the first partial derivatives of "hypot"
> at the given (x, y) ...
> 
> 
> Thanks for bits of enlightenment,

Hope this helps,
Luc

> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org