Hello.
>
> Now that I have some time, let me try to make my case clearly. First I want to say that
this is not some attack on the automaticdifferentation package. I love automaticdifferentation
and symbolic packages. I personally cannot compute a derivative without a computer for the
life of me. And I am also glad we are finally starting to agree on something :)
I don't think that we disagreed about the fact that there was a problem with
the interface to the user application. [_I_ started this thread hinting that
there was a problem (at least for me as a user, based on my usecase).]
[Then there were several misunderstandings about what was the problem and how
to solve it...]
>
> This discussion is about figuring out how an incorrect framework and storage affects
the performance of an optimization. That is why I am so worried.
I can understand that this is an important point for your usecase.
There are now two vantage points (at least) on two aspects to consider: You
seem to have experience where storage matter (while I don't); you worry
about "small" objects allocation (while I don't, anymore).
I'm not saying that your worries do not count; quite the contrary. CM is not
my "toolbox"; it's a library based on the knowledge of its developers
(obviously).
If you bring actual usecases (i.e. unit tests or real application code
that can be benchmarked) that will show worrying behaviour, it will
certainly trigger swift corrective action. [This has happened recently.]
In this area (performance), the problem is that intuition (or educated
guess, however you want to name it) is not a substitute for actual runs,
sometimes by a large amount. [When I started to use CM, I raised the issue
that a 3D vector was immutable (so that I had to create a new one whenever
its coordinates were changing). Surely this was a performance hit! Actually
it wasn't.]
Again, that does not mean that you are wrong in this case. It's just that we
cannot jump and change the code every time someone comes up with what
amounts to "conventional wisdom". If the person comes with a patch that
readily proves the point (at least marginally through microbenchmarking),
then we all benefit.
>
> So lets start with the basics. A Newton method must compute a descent step by solving
a linear equation
>
> H*p = g, (1)
>
> where H is the Hessian, g is the gradient, and p is the desired descent step. What I
am about to say also holds for nonlinear least squares method, where Hessian is approximated
using the Jacobian as
>
> H \approx J^T*J+\lambda I.
>
> Now, if you are not solving a simple optimization problem that you keep giving examples
for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear
algebra framework, otherwise eq. (1) computation is going to take a while.
> This is actually a very active area of research, and that is why having sparse linear
algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
Yes we are removing it because it is buggy and _nobody_ stepped up to say
that it was important for CM users, and to help solve the problems in a way
consistent with reallife usage of such a feature.
As Sébastien said, you are warmly welcome to help us find a way to keep the
feature.
>
> What you are proposing as good OO style
This discussion has really nothing to do with "OO style", merely code reuse;
and it was my big misunderstanding (partly because of my lack of knowledge,
partly because of the lack of documentation on the targetted usages of
"DerivativeStructure", which IIUC now, are outside CM) that I believed
that the new "MultivariateDifferentiableFunction" was the way to go.
[Also, Luc had been moving very fast on this. And I couldn't keep up
although I had wondered earlier why this had to impact usage in the
"optimization" package.]
> is that if someone has a function that they want to optimize, they need to take what
is probably already a RealMatrix or [][], and create 1000x1000 DerivativeStructure objects,
that,
IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, for
this example, using "DerivativeStructure" would lead to about a 4% increase
in storage memory.
> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix?
Not only have you just fragmented your heap space, but your garbage collector
> is going crazy, and you are wasting an incredible amount of memory.
This is the kind of assertions that really needs support from actual code.
[Again, I don't claim to know better; I think that it would really be useful
to accumulate a list of "do and don't" for Java and CM, in the form of
reproducible user experience.]
> Now imagine if your Jacobian is actually very simple to compute but large, then this
whole conversion back and forth actually takes much longer than the function evaluation.
We are willing to take this into account, but since I cannot see the
behaviour in my usecase (where the evaluation time overwhelms, by several
orders of magnitude, all such considerations), I do not have any incentive
to change what seems to be "efficient enough" (for the actually known
cases).
Again, your experience with other usecases would be very valuable to
analyze the efficiency of the CM implementations and improve them, if
necessary.
>
> So, why exactly are you insisting on taking this performance penalty?
It's the other way around. Until the performance penalty is proven, we've
decided that it's up to the developer willing to do the work to decide.
Again, if there is patch, it makes the matter much easier to decide on.
I admit that we decided to change the internal working of the optimizers, so
_we_ should have proved that it does not impact usability. Hence, again, the
usefulness of having a test suite of "real" applications which could be
benchmarked regularly in order to have performance regressions induced by
otherwise welcome changes.
[I raised that issue several times on the ML.]
> You claim that the optimizer can work better if it gets a DerivativeStructure, why?
This was (supposed to be) an improvement of the _internal_ design. [Several
weeks ago, Luc posted the problems he was encountering.]
> Where is the fact that you are holding a DerivativeStructure come into effect for a Newton
method? Can you provide any literature in that regard? The classical optimization bottleneck,
if not the actual function evaluation, is eq. (1). The optimization methodology is build on
top of years of research in computational linear algebra concepts, and does not care how the
matrices are actually computed. Efficient memory usage and speed is critical here because
in Newton optimizations eq. (1) is evaluated thousands of times. The goal of the Jacobian
is not to store derivatives it is to store a matrix of numbers that allows you to quadratically
approximate your target function.
OK. Then Luc's proposal seems appropriate.
There would be new interfaces defining the "optimize" method appropriately.
For algorithms that need the gradient, it must be provided in an additional
argument, of type "MultivariateVectorFunction"; for those that need, the
Jacobian, the argument would be of type "MultivariateMatrixFunction".
Do you agree?
>
> You guys are focusing on people using finite differences and trying to optimize a newton
method to use finite differences. This is not what the main purpose of a newton method is.
If you want a method that dynamically adjusts finite differences step size, you should switch
to BOBYQA, which was designed for this case.
Another misunderstanding probably. [The "stepSize" discussion was sort of
a digression.]
>
> Derivatives can be computed by a computer using a symbolic toolbox a priori (something
that I was referring to when I accidentally said autodifferenation). They can also be efficiently
simplified by that toolbox before being pasted back into you program. Autodiff could also
be an important tool for people if their problem is simple or they are not concerned with
optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton
optimization.
Maybe we should also change the "NewtonSolver" (in package
"o.a.c.m.analysis.solvers"). [In the same way we'll do for the optimizer
with derivatives. Actually those two packages were improved in parallel so
that the interface would be very similar from a usage point of view.]
> If you want to talk about proper OO design and internal simplification then you should
focus on being able to pass a linear solver into your optimization method. This way you allow
people to use iterative and sparse solvers when needed.
This is a new feature; contributions are welcome. [CM is primarily
developed by people who use (part of) it; if they don't need that feature,
or don't know how to implement it, or do not have the time to implement it,
it won't be implemented.]
> If you are concerned about people getting derivatives wrong, you can do what MATLAB does,
which is to add an option to check the derivatives using finite differences when debugging.
IMHO, the toppriority feature to help in debugging is to introduce logging
statments!
>
> This brings me to my second issue. There are important problems where computation
> of the derivatives combined with the actual function evaluation is *significantly*
> faster than computing each alone. I am not clear why I am hitting such resistance
> with this. [...]
[If you think that you were met with strong resistance, I'd suggest that you
look in the archives for the discussions about exceptions in CM...]
Unless I'm mistaken, CM does not prevent you to write a class that takes
advantage of combining the computations.
> What I suggested as function interfaces was just an initial quick suggestion
> that I would be happy for anyone to change in a way that everyone feels positive
> about. I just don't want my second issue to be ignored like a nonissue.
I'm getting confused. Could you please restate what where those first and
second issue?
Thanks,
Gilles

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
