Mailing-List: contact dev-help@commons.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Commons Developers List" <dev@commons.apache.org>
Received-SPF: pass (nike.apache.org: domain of kberlin@gmail.com designates
 209.85.216.43 as permitted sender)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: [Math] Old to new API
 ("MultivariateDifferentiable(Vector)Function")
From: Konstantin Berlin <kberlin@gmail.com>
In-Reply-To: <20121201183148.GG3397@dusk.harfang.homelinux.org>
Date: Sat, 1 Dec 2012 15:58:44 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <42ED51A5-06B9-4B67-9AD5-40708651DFB3@gmail.com>
References: <20121130161135.GG20488@dusk.harfang.homelinux.org>
 <245DF7F3-1003-4A47-BFF3-6877A64F38D8@gmail.com> <50B8F24E.1060104@free.fr>
 <20121130190625.GB3397@dusk.harfang.homelinux.org>
 <A0C4BED0-5C94-41F7-B28B-2E1A794798FD@gmail.com> <50B91160.10708@free.fr>
 <20121130221558.GE3397@dusk.harfang.homelinux.org>
 <03BD0238-AC52-44C1-A88F-B55FDC3BE4C0@gmail.com>
 <20121201183148.GG3397@dusk.harfang.homelinux.org>
To: "Commons Developers List" <dev@commons.apache.org>

I forgot to say that there are commonly used benchmarks for optimization =
algorithm developers. They are commonly used to compare different =
algorithms in publications. I am personally not familiar with them, but =
it would be easy to google them.

On Dec 1, 2012, at 1:31 PM, Gilles Sadowski =
<gilles@harfang.homelinux.org> wrote:

> Hello.
>=20
>>=20
>> Now that I have some time, let me try to make my case clearly. First =
I want to say that this is not some attack on the =
automatic-differentation package. I love automatic-differentation and =
symbolic packages. I personally cannot compute a derivative without a =
computer for the life of me. And I am also glad we are finally starting =
to agree on something :)
>=20
> I don't think that we disagreed about the fact that there was a =
problem with
> the interface to the user application. [_I_ started this thread =
hinting that
> there was a problem (at least for me as a user, based on my =
use-case).]
>=20
> [Then there were several misunderstandings about what was the problem =
and how
> to solve it...]
>=20
>>=20
>> This discussion is about figuring out how an incorrect framework and =
storage affects the performance of an optimization. That is why I am so =
worried.
>=20
> I can understand that this is an important point for your use-case.
> There are now two vantage points (at least) on two aspects to =
consider: You
> seem to have experience where storage matter (while I don't); you =
worry
> about "small" objects allocation (while I don't, anymore).
>=20
> I'm not saying that your worries do not count; quite the contrary. CM =
is not
> my "toolbox"; it's a library based on the knowledge of its developers
> (obviously).
> If you bring actual use-cases (i.e. unit tests or real application =
code
> that can be benchmarked) that will show worrying behaviour, it will
> certainly trigger swift corrective action. [This has happened =
recently.]
>=20
> In this area (performance), the problem is that intuition (or educated
> guess, however you want to name it) is not a substitute for actual =
runs,
> sometimes by a large amount. [When I started to use CM, I raised the =
issue
> that a 3D vector was immutable (so that I had to create a new one =
whenever
> its coordinates were changing). Surely this was a performance hit! =
Actually
> it wasn't.]
>=20
> Again, that does not mean that you are wrong in this case. It's just =
that we
> cannot jump and change the code every time someone comes up with what
> amounts to "conventional wisdom". If the person comes with a patch =
that
> readily proves the point (at least marginally through =
microbenchmarking),
> then we all benefit.
>=20
>>=20
>> So lets start with the basics. A Newton method must compute a descent =
step by solving a linear equation
>>=20
>> H*p =3D -g, (1)
>>=20
>> where H is the Hessian, g is the gradient, and p is the desired =
descent step. What I am about to say also holds for non-linear least =
squares method, where Hessian is approximated using the Jacobian as
>>=20
>> H \approx J^T*J+\lambda I.
>>=20
>> Now, if you are not solving a simple optimization problem that you =
keep giving examples for, you can easily have a Hessian be a very large =
matrix,
>> like 1000x1000 or larger. Now, you better hope that you are storing H =
using your linear algebra framework, otherwise eq. (1) computation is =
going to take a while.
>> This is actually a very active area of research, and that is why =
having sparse linear algebra (aren't you removing this? ;) ) and =
iterative solvers is important to a lot of people.
>=20
> Yes we are removing it because it is buggy and _nobody_ stepped up to =
say
> that it was important for CM users, and to help solve the problems in =
a way
> consistent with real-life usage of such a feature.
> As S=E9bastien said, you are warmly welcome to help us find a way to =
keep the
> feature.
>=20
>>=20
>> What you are proposing as good OO style=20
>=20
> This discussion has really nothing to do with "OO style", merely code =
reuse;
> and it was my big misunderstanding (partly because of my lack of =
knowledge,
> partly because of the lack of documentation on the targetted usages of
> "DerivativeStructure", which IIUC now, are outside CM) that I believed
> that the new "MultivariateDifferentiableFunction" was the way to go.
>=20
> [Also, Luc had been moving very fast on this. And I couldn't keep up
> although I had wondered earlier why this had to impact usage in the
> "optimization" package.]
>=20
>> is that if someone has a function that they want to optimize, they =
need to take what is probably already a RealMatrix or [][], and create =
1000x1000 DerivativeStructure objects, that,
>=20
> IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, =
for
> this example, using "DerivativeStructure" would lead to about a 4% =
increase
> in storage memory.
>=20
>> within the next 10 lines in the optimizer, are going to be converted =
back to a RealMatrix? Not only have you just fragmented your heap space, =
but your garbage collector
>> is going crazy, and you are wasting an incredible amount of memory.
>=20
> This is the kind of assertions that really needs support from actual =
code.
> [Again, I don't claim to know better; I think that it would really be =
useful
> to accumulate a list of "do and don't" for Java and CM, in the form of
> reproducible user experience.]
>=20
>> Now imagine if your Jacobian is actually very simple to compute but =
large, then this whole conversion back and forth actually takes much =
longer than the function evaluation.
>=20
> We are willing to take this into account, but since I cannot see the
> behaviour in my use-case (where the evaluation time overwhelms, by =
several
> orders of magnitude, all such considerations), I do not have any =
incentive
> to change what seems to be "efficient enough" (for the actually known
> cases).
> Again, your experience with other use-cases would be very valuable to
> analyze the efficiency of the CM implementations and improve them, if
> necessary.
>=20
>>=20
>> So, why exactly are you insisting on taking this performance penalty?
>=20
> It's the other way around. Until the performance penalty is proven, =
we've
> decided that it's up to the developer willing to do the work to =
decide.
> Again, if there is patch, it makes the matter much easier to decide =
on.
>=20
> I admit that we decided to change the internal working of the =
optimizers, so
> _we_ should have proved that it does not impact usability. Hence, =
again, the
> usefulness of having a test suite of "real" applications which could =
be
> benchmarked regularly in order to have performance regressions induced =
by
> otherwise welcome changes.
> [I raised that issue several times on the ML.]
>=20
>> You claim that the optimizer can work better if it gets a =
DerivativeStructure, why?
>=20
> This was (supposed to be) an improvement of the _internal_ design. =
[Several
> weeks ago, Luc posted the problems he was encountering.]
>=20
>> Where is the fact that you are holding a DerivativeStructure come =
into effect for a Newton method? Can you provide any literature in that =
regard? The classical optimization bottleneck, if not the actual =
function evaluation, is eq. (1). The optimization methodology is build =
on top of years of research in computational linear algebra concepts, =
and does not care how the matrices are actually computed. Efficient =
memory usage and speed is critical here because in Newton optimizations =
eq. (1) is evaluated thousands of times. The goal of the Jacobian is not =
to store derivatives it is to store a matrix of numbers that allows you =
to quadratically approximate your target function.
>=20
> OK. Then Luc's proposal seems appropriate.
> There would be new interfaces defining the "optimize" method =
appropriately.
> For algorithms that need the gradient, it must be provided in an =
additional
> argument, of type "MultivariateVectorFunction"; for those that need, =
the
> Jacobian, the argument would be of type "MultivariateMatrixFunction".
> Do you agree?
>=20
>>=20
>> You guys are focusing on people using finite differences and trying =
to optimize a newton method to use finite differences. This is not what =
the main purpose of a newton method is. If you want a method that =
dynamically adjusts finite differences step size, you should switch to =
BOBYQA, which was designed for this case.
>=20
> Another misunderstanding probably. [The "stepSize" discussion was sort =
of
> a digression.]
>=20
>>=20
>> Derivatives can be computed by a computer using a symbolic toolbox a =
priori (something that I was referring to when I accidentally said =
auto-differenation). They can also be efficiently simplified by that =
toolbox before being pasted back into you program. Auto-diff could also =
be an important tool for people if their problem is simple or they are =
not concerned with optimal efficiency . This can easily be handled by a =
wrapper and has nothing to do with Newton optimization.
>=20
> Maybe we should also change the "NewtonSolver" (in package
> "o.a.c.m.analysis.solvers"). [In the same way we'll do for the =
optimizer
> with derivatives. Actually those two packages were improved in =
parallel so
> that the interface would be very similar from a usage point of view.]
>=20
>> If you want to talk about proper OO design and internal =
simplification then you should focus on being able to pass a linear =
solver into your optimization method. This way you allow people to use =
iterative and sparse solvers when needed.
>=20
> This is a new feature; contributions are welcome. [CM is primarily
> developed by people who use (part of) it; if they don't need that =
feature,
> or don't know how to implement it, or do not have the time to =
implement it,
> it won't be implemented.]
>=20
>> If you are concerned about people getting derivatives wrong, you can =
do what MATLAB does, which is to add an option to check the derivatives =
using finite differences when debugging.=20
>=20
> IMHO, the top-priority feature to help in debugging is to introduce =
logging
> statments!
>=20
>>=20
>> This brings me to my second issue. There are important problems where =
computation
>> of the derivatives combined with the actual function evaluation is =
*significantly*
>> faster than computing each alone. I am not clear why I am hitting =
such resistance
>> with this. [...]
>=20
> [If you think that you were met with strong resistance, I'd suggest =
that you
> look in the archives for the discussions about exceptions in CM...]
>=20
> Unless I'm mistaken, CM does not prevent you to write a class that =
takes
> advantage of combining the computations.
>=20
>> What I suggested as function interfaces was just an initial quick =
suggestion
>> that I would be happy for anyone to change in a way that everyone =
feels positive
>> about. I just don't want my second issue to be ignored like a =
non-issue.
>=20
> I'm getting confused. Could you please restate what where those first =
and
> second issue?
>=20
> Thanks,
> Gilles
>=20
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>=20


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org