commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Berlin <kber...@gmail.com>
Subject Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function")
Date Sat, 01 Dec 2012 00:42:11 GMT
Hi,

Now that I have some time, let me try to make my case clearly. First I want to say that this
is not some attack on the automatic-differentation package. I love automatic-differentation
and symbolic packages. I personally cannot compute a derivative without a computer for the
life of me. And I am also glad we are finally starting to agree on something :)

This discussion is about figuring out how an incorrect framework and storage affects the performance
of an optimization. That is why I am so worried.

So lets start with the basics. A Newton method must compute a descent step by solving a linear
equation

H*p = -g, (1)

where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about
to say also holds for non-linear least squares method, where Hessian is approximated using
the Jacobian as

H \approx J^T*J+\lambda I.

Now, if you are not solving a simple optimization problem that you keep giving examples for,
you can easily have a Hessian be a very large matrix,
like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra
framework, otherwise eq. (1) computation is going to take a while.
This is actually a very active area of research, and that is why having sparse linear algebra
(aren't you removing this? ;) ) and iterative solvers is important to a lot of people.

What you are proposing as good OO style is that if someone has a function that they want to
optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000
DerivativeStructure objects, that,
within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix?
Not only have you just fragmented your heap space, but your garbage collector
is going crazy, and you are wasting an incredible amount of memory. Now imagine if your Jacobian
is actually very simple to compute but large, then this whole conversion back and forth actually
takes much longer than the function evaluation.

So, why exactly are you insisting on taking this performance penalty? You claim that the optimizer
can work better if it gets a DerivativeStructure, why? Where is the fact that you are holding
a DerivativeStructure come into effect for a Newton method? Can you provide any literature
in that regard? The classical optimization bottleneck, if not the actual function evaluation,
is eq. (1). The optimization methodology is build on top of years of research in computational
linear algebra concepts, and does not care how the matrices are actually computed. Efficient
memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated
thousands of times. The goal of the Jacobian is not to store derivatives it is to store a
matrix of numbers that allows you to quadratically approximate your target function.

You guys are focusing on people using finite differences and trying to optimize a newton method
to use finite differences. This is not what the main purpose of a newton method is. If you
want a method that dynamically adjusts finite differences step size, you should switch to
BOBYQA, which was designed for this case.

Derivatives can be computed by a computer using a symbolic toolbox a priori (something that
I was referring to when I accidentally said auto-differenation). They can also be efficiently
simplified by that toolbox before being pasted back into you program. Auto-diff could also
be an important tool for people if their problem is simple or they are not concerned with
optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton
optimization. If you want to talk about proper OO design and internal simplification then
you should focus on being able to pass a linear solver into your optimization method. This
way you allow people to use iterative and sparse solvers when needed. If you are concerned
about people getting derivatives wrong, you can do what MATLAB does, which is to add an option
to check the derivatives using finite differences when debugging. 

This brings me to my second issue. There are important problems where computation of the derivatives
combined with the actual function evaluation is *significantly* faster than computing each
alone. I am not clear why I am hitting such resistance with this. For example, you are doing
some sort of a simulation, which can be trivially adjusted in the end to give a derivative
or a function value. A very real example of this is a Fast Multipole Method, which takes time
to compute a series expansion of the function, but the resulting series expansion can be analytically
differentiated.

What I suggested as function interfaces was just an initial quick suggestion that I would
be happy for anyone to change in a way that everyone feels positive about. I just don't want
my second issue to be ignored like a non-issue.

Thanks for your time,
Konstantin

On Nov 30, 2012, at 5:15 PM, Gilles Sadowski <gilles@harfang.homelinux.org> wrote:

>>> 
>>> I don't know if people are confused about auto-differentation, I
>>> think most people working in numerical analysis are very well aware
>>> of what it does. The issue here is that it is a completely separate
>>> subject from optimizations. In a proper OO design you would not mix
>>> the two together. Computation of the derivatives is a separate
>>> component from optimization. Optimization only assumes that you can
>>> compute one, it shouldn't care or force you two compute one in a
>>> specific way. That's bad design. Everyone component should only have
>>> necessary and sufficient requirements for use. Automatic
>>> differentiation is not necessary for optimization so it should not
>>> exist in the interface.
>> 
>> You are right.
>> 
>>> 
>>> I would like to add, that if my function is simple enough, I am very
>>> capable of running autdiff package a priori myself. The real need for
>>> finite-difference is when you cannot analytically compute the
>>> derivative. Though I think having a wrapper function around autdiff
>>> package is a nice tool to help developers quickly write code, if they
>>> prefer to have quicker development time but slightly slower code.
>> 
>>> 
>>> If someone can explain to me why auto-diff package should be directly
>>> forced or even be provided as alternative into optimization package
>>> when a simple wrapper function can do this task while maintaining
>>> much cleaner OO design, please tell me.
>> 
>> You just found the reason and explained it by yourself: it was bad
>> design and I am the one to blame for this. I was (and still am) stupid.
> 
> I'm not convinced that it was bad design. Maybe I'm completely off now:
> I thought that the purpose was internal simplification. I don't think
> that it can be inherently wrong to store any sort of derivatives (be it
> gradient or Jacobian) into a structure aimed at storing... partial
> derivatives (i.e. "DerivativeStructure").
> 
> Please let me know whether the problem is more in-depth than what I
> described previously (allowing the current way to provide the gradient
> and Jacobian).
> 
> I would nevertheless have suggested to delete the method
>    MultivariateFunction partialDerivative(int k);
> from "DifferentiableMultivariateFunction".
> 
>> 
>> So I suggest we disconnect differentiation from optimization, but in a
>> way that would let users decide how they provide the differentials. This
>> means I would not like to reintroduce the former interfaces.
>> 
>> What about having the optimize() methods taking two arguments function,
>> a MultivariateFunction for the value and a MultivariateVectorFunction
>> for the gradient? It would be up to the user to select how they compute
>> each function. They could use direct coding, they could use a finite
>> differences wrapper around the first function to implement the second,
> 
> 
> 
>> they could use our differentiation framework and put one wrapper to
>> extract the value and another one to extract the gradient.
> 
> That would also be a nice example of usage of the differentiation
> framework.
> 
>> What do you think?
> 
> Yes. Now I got it (I think). IIUC, we could even introduce the new interface
> before 3.1, and deprecated old the older ones (as you wanted to do anyways,
> IIRC).
> 
> I'll give it a try in the next days. OK?
> 
> 
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message