commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")
Date Fri, 30 Nov 2012 20:04:48 GMT
Le 30/11/2012 20:30, Konstantin Berlin a écrit :
> Hi,

Hi Konstantin,

> 
> I don't know if people are confused about auto-differentation, I
> think most people working in numerical analysis are very well aware
> of what it does. The issue here is that it is a completely separate
> subject from optimizations. In a proper OO design you would not mix
> the two together. Computation of the derivatives is a separate
> component from optimization. Optimization only assumes that you can
> compute one, it shouldn't care or force you two compute one in a
> specific way. That's bad design. Everyone component should only have
> necessary and sufficient requirements for use. Automatic
> differentiation is not necessary for optimization so it should not
> exist in the interface.

You are right.

> 
> I would like to add, that if my function is simple enough, I am very
> capable of running autdiff package a priori myself. The real need for
> finite-difference is when you cannot analytically compute the
> derivative. Though I think having a wrapper function around autdiff
> package is a nice tool to help developers quickly write code, if they
> prefer to have quicker development time but slightly slower code.

> 
> If someone can explain to me why auto-diff package should be directly
> forced or even be provided as alternative into optimization package
> when a simple wrapper function can do this task while maintaining
> much cleaner OO design, please tell me.

You just found the reason and explained it by yourself: it was bad
design and I am the one to blame for this. I was (and still am) stupid.

So I suggest we disconnect differentiation from optimization, but in a
way that would let users decide how they provide the differentials. This
means I would not like to reintroduce the former interfaces.

What about having the optimize() methods taking two arguments function,
a MultivariateFunction for the value and a MultivariateVectorFunction
for the gradient? It would be up to the user to select how they compute
each function. They could use direct coding, they could use a finite
differences wrapper around the first function to implement the second,
they could use our differentiation framework and put one wrapper to
extract the value and another one to extract the gradient.

What do you think?

Luc

> 
> Thanks, Konstantin
> 
> On Nov 30, 2012, at 2:06 PM, Gilles Sadowski
> <gilles@harfang.homelinux.org> wrote:
> 
>> Hi Luc.
>> 
>>>> As a user of the optimization algorithms I am completely
>>>> confused by the change. It seems different from how
>>>> optimization function are typically used and seems to be
>>>> creating a barrier for no reason.
>>> 
>>> The reason is that the framework has been done for several uses,
>>> not only optimization.
>> 
>> Let's admit that the "autodiff" subject is not yet well-known, and
>> viewed from the CM "optimization" package it looks very complex:
>> some examples would be welcome in the user guide. The
>> "DerivativeStructure" itself is not easily grasped (e.g. there are 
>> many constructors and one does not know which are "user-oriented"
>> and which are more linked to the internal structure ("DSCompiler"
>> which is even more cryptic at first glance).
>> 
>> Of course, _I_ just have to start reading about the subject in
>> order to understand; you are not expected to provide the background
>> within the Javadoc! :-)
>> 
>>>> 
>>>> I am not clear why you can't just leave the standard interface
>>>> to an optimizer be a function that computes the value and the
>>>> Jacobian (in case of least-squares), the gradient (for
>>>> quasi-Newton methods) and if you actually have a full newton
>>>> method, also the Hessian.
>>>> 
>>>> If the user wants to compute the Jacobian (gradient) using
>>>> finite differences they can do it themselves, or wrap it into a
>>>> class that you can provide them that will compute finite
>>>> differences using the desired algorithm.
>>> 
>>> This is already what many people do, and it can be done with both
>>> the older and the newer API. Nothing prevents users to use
>>> finite differences in the objects they pass to the optimizer.
>>> 
>>>> 
>>>> Also I can image a case when computation of the Jacobian can be
>>>> sped up if the function value is known, yet if you have two
>>>> separate functions handle the derivatives and the actual
>>>> function value. For example f^2(x). You can probably derive
>>>> some kind of caching scheme, but still.
>>>> 
>>>> Maybe I am missing something, but I spend about an hour trying
>>>> to figure out how change my code to adapt to your new
>>>> framework. Still haven't figured it out.
>>> 
>>> I can easily understand. It is really new and needs some
>>> polishing and documenting. I am sorry for that.
>> 
>> The implementation was a huge work already. Thanks fo that.
>> 
>>> 
>>> In your case, if you already have two different functions, you
>>> can merge them to create a
>>> MultivariateDifferentiableVectorFunction and pass this to the
>>> optimizer. See how 
>>> FunctionUtils.toMultivariateDifferentiableVectorFunctiontoMultivariateDifferentiableVectorFunction
>>>
>>> 
does it, starting from a DifferentiableMultivariateVectorFunction.
>>> 
>>> See below about the deprecation of this converter method,
>>> though.
>>> 
>>> Note that the new API is simply another way to represent the
>>> same information. The former way was limited to first derivatives
>>> and was really awkward when multiple dimensions were involved (as
>>> you derive with respect to several variables, a real function
>>> becomes a vector function (a gradient), a vector function becomes
>>> a matrix function and it becomes quickly untrackable. If you
>>> start thinking about second derivative, it is worse. It was also
>>> difficult when you combine functions, for example if you compute
>>> f(u), it looks like a univariate function, but if you see that u
>>> = g(x, y, z), it is really a multivariate function. When
>>> computing differentials, you have some problems pushing all
>>> partial differentials du/dx, du/dy, du/dz to the df/du function,
>>> there is a bottleneck. The only solution people had was to do the
>>> composition outside by themselves. The new API avoids that, it 
>>> doesn care if u is a simple canonical variable or by itself a
>>> function from some former computations.
>>> 
>>> For information, the original proposal from July about the 
>>> differentiation framework is here: 
>>> <http://mail-archives.apache.org/mod_mbox/commons-dev/201207.mbox/%3C500C2BBF.2020806%40spaceroots.org%3E>.
>>>
>>>>
>>>>
>>> 
On Nov 30, 2012, at 11:11 AM, Gilles Sadowski
>>>> <gilles@harfang.homelinux.org> wrote:
>>>> 
>>>>> Hello.
>>>>> 
>>>>> Context: 1. A user application computes the Jacobian of a 
>>>>> multivariate vector function (the output of a simulation)
>>>>> using finite differences. 2. The covariance matrix is
>>>>> obtained from "AbstractLeastSquaresOptimizer". In the new
>>>>> API, the Jacobian is supposed to be "automatically" computed
>>>>> from the "MultivariateDifferentiableVectorFunction" objective
>>>>> function. 3. The converter from
>>>>> "toMultivariateDifferentiableVectorFunction" to
>>>>> "MultivariateDifferentiableVectorFunction" (in
>>>>> "FunctionUtils") is deprecated.
>>> 
>>> It was both introduced in 3.1 and deprecated at the same time.
>>> It's not the converter per se which is considered wrong, it is
>>> the DifferentiableMultivariateVectorFunction interface which is
>>> deprecated, hence the converter using this interface is
>>> deprecated as a consequence.
>> 
>> I know.
>> 
>> However, you suggested above that we look at the converter code in
>> order to figure out how to switch to the new API. Of course, that's
>> what I've done all day, which made me wonder: If users need to do
>> the same over and over again, why not keep the code in CM indeed?
>> 
>>> 
>>>>> 4. A "FiniteDifferencesDifferentiator" operator currently
>>>>> exists but only for univariate functions. Unles I'm mistaken,
>>>>> a direct extension to multiple variables won't do: * because
>>>>> the implementation uses the symmetric formula, but in some
>>>>> cases (bounded parameter range), it will fail, and * because
>>>>> of the floating point representation of real values, the 
>>>>> delta for sampling the function should depend on the
>>>>> magnitude of of the parameter value around which the sampling
>>>>> is done whereas the "stepSize" is constant in the
>>>>> implementation.
>>>>> 
>>>>> Questions:
>>> 
>>>>> 1. Shouldn't we keep the converters so that users can keep
>>>>> their "home-made" (first-order) derivative computations?
>>>>> [Converters exist for gradient of
>>>>> "DifferentiableMultivariateFunction" and Jacobian of
>>>>> "DifferentiableMultivariateVectorFunction".]
>>> 
>>> We could decide to not deprecate the older interface and hence
>>> not deprecate these converters. We could also set up a converter
>>> that would not use a DifferentiableMultivariateVectorFunction but
>>> could use instead two parameters: one MultivariateVectorFunction
>>> and one MultivariateMatrixFunction.
>>> 
>>> I'm not sure about this. I would really like to get rid of the
>>> former interface which is confusing as we introduced the new
>>> one.
>> 
>> IIUC, the "problem" or "barrier" is (as I hinted at in my previous
>> post) that the "DerivativeStructure" does not bring anything when
>> we just fill it with precomputed values. (is that correct?). It
>> just makes the interface very unfamiliar (using a complex
>> structure) where "double[]" and "double[][]" were fairly
>> straightforward to represent "multivariate" values and Jacobian,
>> respectively. [Maybe this point can be alleviated by an extensive
>> example in the user guide.]
>> 
>>> 
>>>>> 2. Is it worth creating the multivariate equivalent of the 
>>>>> univariate "FiniteDifferencesDifferentiator", assuming that
>>>>> higher orders will rarely be used because of * the loss of
>>>>> accuracy (as stated in the doc), and/or * the sometimes
>>>>> prohibitively expensive number of evaluations of the
>>>>> objective function? [1]
>>> 
>>> Yes, I think it is worth doing, using separate step sizes for
>>> each components (as they may have very different behaviour and
>>> units).
>> 
>> Indeed (see below).
>> 
>>> 
>>>>> 3. As current CM optimization algorithms need only the
>>>>> gradient or Jacobian, would it be sufficient to only provide
>>>>> a limited (two-points first-order) finite differences
>>>>> operator (with the possiblity to choose either "symmetric",
>>>>> "forward" or "backward" formula for each parameter)?
>>> 
>>> I'm not sure. As I explained above, the framework was not done
>>> with only the optimizers in mind (they are an important use of
>>> differentials, but are not the only one). This framework is also
>>> used on its own as a [math] feature, directly from user code.
>>> 
>>> The underlying computation for derivatives using finite
>>> differences can be easily improved to use non-symmetric points,
>>> even for high order derivatives. It is sufficient to compute the
>>> abscissae correctly by changing the single line :
>>> 
>>> final double xi = t0 + stepSize * (i - 0.5 * (nbPoints - 1));
>> 
>> What about: double xi = t0 * (1 + stepSize * (i - 0.5 * (nbPoints -
>> 1))) ? [In order to automatically adapt to the accuracy loss due to
>> floating point. I.e. the "stepSize" would be relative.]
>> 
>>> 
>>> Once the xi are computed, the remaining formulas are valid and
>>> they do not suppose the xi are around the point (they may even
>>> not be regularly spaced). The only constraint is that all xi are
>>> different from each other.
>> 
>> Personally, I'd be pleased to have more documentation in the code
>> (around the part that mentions "differences diagonal arrays",
>> "interpolation" and "monomial") or just a pointer to where that
>> computation is explained.
>> 
>>> 
>>> So it would be possible to first add a enumerate : Formula with
>>> values CENTERED, BACKWARD, FORWARD as a constructor parameter to
>>> the existing univariate FiniteDifferencesDifferentiator and then
>>> to add another differentirator for multivariate functions, using
>>> an array of stepsized and an array of Formula enumerates so all
>>> components are computed according to their constraints. I think
>>> we could use the same number of points for all components, but it
>>> would also be possible to have a different number of points if
>>> needed.
>>> 
>>> Would you like me to do that?
>> 
>> I indeed tried to create the "multivariate" version, before being
>> stuck in even more basic things. E.g. in trying to understand
>> "DerivativeStructure", I went to the unit tests and found this
>> ("testToMultivariateDifferentiableFunction() in 
>> "FunctionUtilsTest): ----- MultivariateDifferentiableFunction
>> converted =
>> FunctionUtils.toMultivariateDifferentiableFunction(hypot); for
>> (double x = 0.1; x < 0.5; x += 0.01) { for (double y = 0.1; y <
>> 0.5; y += 0.01) { DerivativeStructure[] t = new
>> DerivativeStructure[] { new DerivativeStructure(3, 1, x, 1.0, 2.0,
>> 3.0 ), // <-- new DerivativeStructure(3, 1, y, 4.0, 5.0, 6.0 )  //
>> <-- }; DerivativeStructure h = DerivativeStructure.hypot(t[0],
>> t[1]); Assert.assertEquals(h.getValue(), 
>> converted.value(t).getValue(), 1.0e-10); 
>> Assert.assertEquals(h.getPartialDerivative(1, 0, 0), 
>> converted.value(t).getPartialDerivative(1, 0, 0), 1.0e-10); 
>> Assert.assertEquals(h.getPartialDerivative(0, 1, 0), 
>> converted.value(t).getPartialDerivative(0, 1, 0), 1.0e-10); 
>> Assert.assertEquals(h.getPartialDerivative(0, 0, 1), 
>> converted.value(t).getPartialDerivative(0, 0, 1), 1.0e-10); } } 
>> ----- I don't understand the lines which I marked with an arrow. 
>> Why 3 variables, why that constructor (cf. remark above)? In fact,
>> why is it not: ----- new DerivativeStructure(2, 1, 0, x), new
>> DerivativeStructure(2, 1, 1, y) ----- ? [And consequent changes in
>> the calls to "getPartialDerivative(int...)".]
>> 
>> I didn't figure out why the test pass, or what should be result of
>> the computation: "h" does not contain the first partial derivatives
>> of "hypot" at the given (x, y) ...
>> 
>> 
>> Thanks for bits of enlightenment, Gilles
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message