Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBC45D359 for ; Sat, 1 Dec 2012 20:59:15 +0000 (UTC) Received: (qmail 37022 invoked by uid 500); 1 Dec 2012 20:59:15 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 36912 invoked by uid 500); 1 Dec 2012 20:59:15 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 36904 invoked by uid 99); 1 Dec 2012 20:59:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Dec 2012 20:59:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kberlin@gmail.com designates 209.85.216.43 as permitted sender) Received: from [209.85.216.43] (HELO mail-qa0-f43.google.com) (209.85.216.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Dec 2012 20:59:07 +0000 Received: by mail-qa0-f43.google.com with SMTP id cr7so529558qab.9 for ; Sat, 01 Dec 2012 12:58:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=5NNa/56CKDfHUh4EoYSrSiVC81RA3RLRK/wtZ5aHC2I=; b=lOXLeV1R6Hjd4hfqUJ82D2IvAfIZqjNiUSeBP6EEUR+0y0SHm/jaqxdn8LZtX+b6VH cEO7F6nxWN8BEAbVkBBfdqlnj01WZiBL3CmVT/c36psXmMVotyuPUK5b2xzv1pcnNEo4 pHQMvirdZofwJAUMAQAU/d/+QEC6Xq2P4YRXDJkm5lwSRQnY9XEkVdJGH40R4a0dy9YU zISLwOwE5i8YhTL4waA47Cq+zkz8VGFgq2oPUbaFtHd9uITrFvmbOfFZsMZu2yeY4ucu WPvDLEJch7Hk7WCTzFoMPsjurQd3i11GCPUE6OpcSGWMQwQ+VcoZufVOmnNbiOBFjP0F +nZA== Received: by 10.49.106.70 with SMTP id gs6mr9919472qeb.36.1354395526707; Sat, 01 Dec 2012 12:58:46 -0800 (PST) Received: from [192.168.1.151] (pool-173-79-236-4.washdc.fios.verizon.net. [173.79.236.4]) by mx.google.com with ESMTPS id q14sm5866579qaa.15.2012.12.01.12.58.45 (version=SSLv3 cipher=OTHER); Sat, 01 Dec 2012 12:58:45 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: [Math] Old to new API ("MultivariateDifferentiable(Vector)Function") From: Konstantin Berlin In-Reply-To: <20121201183148.GG3397@dusk.harfang.homelinux.org> Date: Sat, 1 Dec 2012 15:58:44 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <42ED51A5-06B9-4B67-9AD5-40708651DFB3@gmail.com> References: <20121130161135.GG20488@dusk.harfang.homelinux.org> <245DF7F3-1003-4A47-BFF3-6877A64F38D8@gmail.com> <50B8F24E.1060104@free.fr> <20121130190625.GB3397@dusk.harfang.homelinux.org> <50B91160.10708@free.fr> <20121130221558.GE3397@dusk.harfang.homelinux.org> <03BD0238-AC52-44C1-A88F-B55FDC3BE4C0@gmail.com> <20121201183148.GG3397@dusk.harfang.homelinux.org> To: "Commons Developers List" X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org I forgot to say that there are commonly used benchmarks for optimization = algorithm developers. They are commonly used to compare different = algorithms in publications. I am personally not familiar with them, but = it would be easy to google them. On Dec 1, 2012, at 1:31 PM, Gilles Sadowski = wrote: > Hello. >=20 >>=20 >> Now that I have some time, let me try to make my case clearly. First = I want to say that this is not some attack on the = automatic-differentation package. I love automatic-differentation and = symbolic packages. I personally cannot compute a derivative without a = computer for the life of me. And I am also glad we are finally starting = to agree on something :) >=20 > I don't think that we disagreed about the fact that there was a = problem with > the interface to the user application. [_I_ started this thread = hinting that > there was a problem (at least for me as a user, based on my = use-case).] >=20 > [Then there were several misunderstandings about what was the problem = and how > to solve it...] >=20 >>=20 >> This discussion is about figuring out how an incorrect framework and = storage affects the performance of an optimization. That is why I am so = worried. >=20 > I can understand that this is an important point for your use-case. > There are now two vantage points (at least) on two aspects to = consider: You > seem to have experience where storage matter (while I don't); you = worry > about "small" objects allocation (while I don't, anymore). >=20 > I'm not saying that your worries do not count; quite the contrary. CM = is not > my "toolbox"; it's a library based on the knowledge of its developers > (obviously). > If you bring actual use-cases (i.e. unit tests or real application = code > that can be benchmarked) that will show worrying behaviour, it will > certainly trigger swift corrective action. [This has happened = recently.] >=20 > In this area (performance), the problem is that intuition (or educated > guess, however you want to name it) is not a substitute for actual = runs, > sometimes by a large amount. [When I started to use CM, I raised the = issue > that a 3D vector was immutable (so that I had to create a new one = whenever > its coordinates were changing). Surely this was a performance hit! = Actually > it wasn't.] >=20 > Again, that does not mean that you are wrong in this case. It's just = that we > cannot jump and change the code every time someone comes up with what > amounts to "conventional wisdom". If the person comes with a patch = that > readily proves the point (at least marginally through = microbenchmarking), > then we all benefit. >=20 >>=20 >> So lets start with the basics. A Newton method must compute a descent = step by solving a linear equation >>=20 >> H*p =3D -g, (1) >>=20 >> where H is the Hessian, g is the gradient, and p is the desired = descent step. What I am about to say also holds for non-linear least = squares method, where Hessian is approximated using the Jacobian as >>=20 >> H \approx J^T*J+\lambda I. >>=20 >> Now, if you are not solving a simple optimization problem that you = keep giving examples for, you can easily have a Hessian be a very large = matrix, >> like 1000x1000 or larger. Now, you better hope that you are storing H = using your linear algebra framework, otherwise eq. (1) computation is = going to take a while. >> This is actually a very active area of research, and that is why = having sparse linear algebra (aren't you removing this? ;) ) and = iterative solvers is important to a lot of people. >=20 > Yes we are removing it because it is buggy and _nobody_ stepped up to = say > that it was important for CM users, and to help solve the problems in = a way > consistent with real-life usage of such a feature. > As S=E9bastien said, you are warmly welcome to help us find a way to = keep the > feature. >=20 >>=20 >> What you are proposing as good OO style=20 >=20 > This discussion has really nothing to do with "OO style", merely code = reuse; > and it was my big misunderstanding (partly because of my lack of = knowledge, > partly because of the lack of documentation on the targetted usages of > "DerivativeStructure", which IIUC now, are outside CM) that I believed > that the new "MultivariateDifferentiableFunction" was the way to go. >=20 > [Also, Luc had been moving very fast on this. And I couldn't keep up > although I had wondered earlier why this had to impact usage in the > "optimization" package.] >=20 >> is that if someone has a function that they want to optimize, they = need to take what is probably already a RealMatrix or [][], and create = 1000x1000 DerivativeStructure objects, that, >=20 > IIUC, that would be 1000 "DerivativeStructure" (not 1000x1000). If so, = for > this example, using "DerivativeStructure" would lead to about a 4% = increase > in storage memory. >=20 >> within the next 10 lines in the optimizer, are going to be converted = back to a RealMatrix? Not only have you just fragmented your heap space, = but your garbage collector >> is going crazy, and you are wasting an incredible amount of memory. >=20 > This is the kind of assertions that really needs support from actual = code. > [Again, I don't claim to know better; I think that it would really be = useful > to accumulate a list of "do and don't" for Java and CM, in the form of > reproducible user experience.] >=20 >> Now imagine if your Jacobian is actually very simple to compute but = large, then this whole conversion back and forth actually takes much = longer than the function evaluation. >=20 > We are willing to take this into account, but since I cannot see the > behaviour in my use-case (where the evaluation time overwhelms, by = several > orders of magnitude, all such considerations), I do not have any = incentive > to change what seems to be "efficient enough" (for the actually known > cases). > Again, your experience with other use-cases would be very valuable to > analyze the efficiency of the CM implementations and improve them, if > necessary. >=20 >>=20 >> So, why exactly are you insisting on taking this performance penalty? >=20 > It's the other way around. Until the performance penalty is proven, = we've > decided that it's up to the developer willing to do the work to = decide. > Again, if there is patch, it makes the matter much easier to decide = on. >=20 > I admit that we decided to change the internal working of the = optimizers, so > _we_ should have proved that it does not impact usability. Hence, = again, the > usefulness of having a test suite of "real" applications which could = be > benchmarked regularly in order to have performance regressions induced = by > otherwise welcome changes. > [I raised that issue several times on the ML.] >=20 >> You claim that the optimizer can work better if it gets a = DerivativeStructure, why? >=20 > This was (supposed to be) an improvement of the _internal_ design. = [Several > weeks ago, Luc posted the problems he was encountering.] >=20 >> Where is the fact that you are holding a DerivativeStructure come = into effect for a Newton method? Can you provide any literature in that = regard? The classical optimization bottleneck, if not the actual = function evaluation, is eq. (1). The optimization methodology is build = on top of years of research in computational linear algebra concepts, = and does not care how the matrices are actually computed. Efficient = memory usage and speed is critical here because in Newton optimizations = eq. (1) is evaluated thousands of times. The goal of the Jacobian is not = to store derivatives it is to store a matrix of numbers that allows you = to quadratically approximate your target function. >=20 > OK. Then Luc's proposal seems appropriate. > There would be new interfaces defining the "optimize" method = appropriately. > For algorithms that need the gradient, it must be provided in an = additional > argument, of type "MultivariateVectorFunction"; for those that need, = the > Jacobian, the argument would be of type "MultivariateMatrixFunction". > Do you agree? >=20 >>=20 >> You guys are focusing on people using finite differences and trying = to optimize a newton method to use finite differences. This is not what = the main purpose of a newton method is. If you want a method that = dynamically adjusts finite differences step size, you should switch to = BOBYQA, which was designed for this case. >=20 > Another misunderstanding probably. [The "stepSize" discussion was sort = of > a digression.] >=20 >>=20 >> Derivatives can be computed by a computer using a symbolic toolbox a = priori (something that I was referring to when I accidentally said = auto-differenation). They can also be efficiently simplified by that = toolbox before being pasted back into you program. Auto-diff could also = be an important tool for people if their problem is simple or they are = not concerned with optimal efficiency . This can easily be handled by a = wrapper and has nothing to do with Newton optimization. >=20 > Maybe we should also change the "NewtonSolver" (in package > "o.a.c.m.analysis.solvers"). [In the same way we'll do for the = optimizer > with derivatives. Actually those two packages were improved in = parallel so > that the interface would be very similar from a usage point of view.] >=20 >> If you want to talk about proper OO design and internal = simplification then you should focus on being able to pass a linear = solver into your optimization method. This way you allow people to use = iterative and sparse solvers when needed. >=20 > This is a new feature; contributions are welcome. [CM is primarily > developed by people who use (part of) it; if they don't need that = feature, > or don't know how to implement it, or do not have the time to = implement it, > it won't be implemented.] >=20 >> If you are concerned about people getting derivatives wrong, you can = do what MATLAB does, which is to add an option to check the derivatives = using finite differences when debugging.=20 >=20 > IMHO, the top-priority feature to help in debugging is to introduce = logging > statments! >=20 >>=20 >> This brings me to my second issue. There are important problems where = computation >> of the derivatives combined with the actual function evaluation is = *significantly* >> faster than computing each alone. I am not clear why I am hitting = such resistance >> with this. [...] >=20 > [If you think that you were met with strong resistance, I'd suggest = that you > look in the archives for the discussions about exceptions in CM...] >=20 > Unless I'm mistaken, CM does not prevent you to write a class that = takes > advantage of combining the computations. >=20 >> What I suggested as function interfaces was just an initial quick = suggestion >> that I would be happy for anyone to change in a way that everyone = feels positive >> about. I just don't want my second issue to be ignored like a = non-issue. >=20 > I'm getting confused. Could you please restate what where those first = and > second issue? >=20 > Thanks, > Gilles >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org > For additional commands, e-mail: dev-help@commons.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org