On 12/01/2012 01:42 AM, Konstantin Berlin wrote:
> Hi,
>
> Now that I have some time, let me try to make my case clearly. First I want to say that this is not some attack on the automatic-differentation package. I love automatic-differentation and symbolic packages. I personally cannot compute a derivative without a computer for the life of me. And I am also glad we are finally starting to agree on something :)
>
> This discussion is about figuring out how an incorrect framework and storage affects the performance of an optimization. That is why I am so worried.
>
> So lets start with the basics. A Newton method must compute a descent step by solving a linear equation
>
> H*p = -g, (1)
>
> where H is the Hessian, g is the gradient, and p is the desired descent step. What I am about to say also holds for non-linear least squares method, where Hessian is approximated using the Jacobian as
>
> H \approx J^T*J+\lambda I.
>
> Now, if you are not solving a simple optimization problem that you keep giving examples for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear algebra framework, otherwise eq. (1) computation is going to take a while.
> This is actually a very active area of research, and that is why having sparse linear algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
>
> What you are proposing as good OO style is that if someone has a function that they want to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix? Not only have you just fragmented your heap space, but your garbage collector
> is going crazy, and you are wasting an incredible amount of memory. Now imagine if your Jacobian is actually very simple to compute but large, then this whole conversion back and forth actually takes much longer than the function evaluation.
>
> So, why exactly are you insisting on taking this performance penalty? You claim that the optimizer can work better if it gets a DerivativeStructure, why? Where is the fact that you are holding a DerivativeStructure come into effect for a Newton method? Can you provide any literature in that regard? The classical optimization bottleneck, if not the actual function evaluation, is eq. (1). The optimization methodology is build on top of years of research in computational linear algebra concepts, and does not care how the matrices are actually computed. Efficient memory usage and speed is critical here because in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives it is to store a matrix of numbers that allows you to quadratically approximate your target function.
>
> You guys are focusing on people using finite differences and trying to optimize a newton method to use finite differences. This is not what the main purpose of a newton method is. If you want a method that dynamically adjusts finite differences step size, you should switch to BOBYQA, which was designed for this case.
>
> Derivatives can be computed by a computer using a symbolic toolbox a priori (something that I was referring to when I accidentally said auto-differenation). They can also be efficiently simplified by that toolbox before being pasted back into you program. Auto-diff could also be an important tool for people if their problem is simple or they are not concerned with optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton optimization. If you want to talk about proper OO design and internal simplification then you should focus on being able to pass a linear solver into your optimization method. This way you allow people to use iterative and sparse solvers when needed. If you are concerned about people getting derivatives wrong, you can do what MATLAB does, which is to add an option to check the derivatives using finite differences when debugging.
>
> This brings me to my second issue. There are important problems where computation of the derivatives combined with the actual function evaluation is *significantly* faster than computing each alone. I am not clear why I am hitting such resistance with this. For example, you are doing some sort of a simulation, which can be trivially adjusted in the end to give a derivative or a function value. A very real example of this is a Fast Multipole Method, which takes time to compute a series expansion of the function, but the resulting series expansion can be analytically differentiated.
>
> What I suggested as function interfaces was just an initial quick suggestion that I would be happy for anyone to change in a way that everyone feels positive about. I just don't want my second issue to be ignored like a non-issue.
I think the discussion on these things just started, and imho the guys
here are really open to discuss changes and even admit mistakes or bad
design, it just takes time to digest feedback.
Also the current design may be based on wrong assumptions (e.g. for too
specific use-cases), whereas the general case is handled in a
sub-optimal way, so we have to adapt.
just my 2 cents.
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org