commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: math : optim package documentation
Date Fri, 31 May 2013 12:48:11 GMT
Hello.

Such a discussion should probably take place on the "dev" ML.

On Fri, 31 May 2013 09:39:41 +0100 (BST), François Laferrière wrote:
> ________________________________
>  De : Gilles <gilles@harfang.homelinux.org>
> À : user@commons.apache.org
> Envoyé le : Mardi 21 mai 2013 0h09
> Objet : Re: math : optim package documentation
>
>
>> [...snip ..]
>>
>> However, before you start writing code that rely on this API,
>> be sure to read that recent thread:
>>  http://markmail.org/thread/3fjvucyz7rax4cyi
>> What is your opinion on the proposed change?
>>
>>
>> Regards,
>> Gilles
>
> I carefully read this thread.
>
> First thing first, I thing it is way better than 3.2 API. Getting rid
> of the variable param list it is a great improvement in term of 
> design
> AND usability. With this old API it was not possible when using an 
> IDE
> (like Eclipse) to explore the API through autocompletion. Further, 
> the
> convenience class OptimisationData is not very good. This makes
> necessary to shoehorn unrelated features in the same class hierarchy,
> ant thus make heavy use of "instanceof" (which is generally 
> considered
> as a symptom of wrong class design)

"OptimizationData" is not an "old" API; it was introduced in 3.1.

I used this marker interface only reluctantly, as a measure for solving
an annoying issue that is having different setters for different 
algorithms,
although the "interesting" feature is the same (i.e. the "optimize" 
method).

This is especially a burden for the usage which you described, trying
different optimization methods and parameters. With the current scheme,
the idea was that you create a list of "OptimizationData" and each 
algorithm
chooses which to use.
I don't deny that this has drawbacks too (e.g. some users were 
expecting that
all features would work with all algorithms).

We are going to move to the "fluent" API as it will indeed make it 
clearer
which input data is needed for which optimizers (and, probably, allow 
API
"auto-discovery" as you mention above).
[But this will also re-instate the previous drawback, and, yes, more 
intances
to be created (cf. below).]

>
> About immutability
>
> I see the advantages of immutability, but I am not convinced that it
> should be made mandatory. In my project, I need to call the same
> optimizer zillions on times by just changing some parameters of the
> objective function and the initial guess. Creating a new instance 
> each
> time shall, in this case, put a too heavy load on the memory
> management (allocation and garbage collection of zillions of instance
> of the optimizer).

I'm afraid that you'll have to back this statement with actual 
benchmark
numbers.
CM objects are quite small and the JVM allocates them very efficiently,
from its own memory pool, not from the system's).

Also, whenever the function to be optimized is doing any significant 
work,
the optimization's duration is order of magnitudes larger that the
intialization's duration.

>
> Maybe, it is not necessary to choose? maybe it is possible have API
> with setXXX() (for computer intensive app) and withXXX()
> (immutability)? Doing so, the user have the choice.

Yes, at the cost of a heavier development maintainance burden (e.g.
checking correctness in multi-thread applications).

>
> In this double pattern, setXXX() can also "fluent" so that we can 
> write
>
>   MyOptimizer opt = new MyOptimiser().setXXX().setYYY() ;

This will be an option only if it is proven that it would significantly
increase the performance of an application.

>
> A "debuggable" interface?
>
> Always "scratching my own itch", in my project, I do need to compare
> the performance of different algorithm. So I need to keep track not
> only of initial guess, final results and number of iteration, but 
> also
> of intermediate results. For instance, it would be gread to have an
> option to keep track of the trajectory, for instance as an array of
> values that can be retrieved at the end. Something like
>
>
> class MyOptimizer implement DebuggableOptim ...
>
> MyOptimizer optim = new MyOptimiser().withDebuggable(true) ;
> PointValuePair resultPair = optim.optimize(...) ;
> PointValuePair [] trajectory = optim.getTrajectory() ;

This is a useful/important/indispensable feature for research or 
testing
Many algorithms actually need it. But I'd like it to be a framework 
that
can be used globally, not create a "Debuggable<Something>" for each
<Something> hierarchy in CM. That would bloat the codebase, with a lot 
of
duplication.

An alternate solution is to introduce logging statements: they would
create a trace of the inner workings, to be analyzed with external 
tools.
There is nothing to implement and it is trivial to use.
I've proposed this many times, but it stumbles on the requirement that
CM should not have any dependencies.


> Development agenda
>
> Is there a date at which I can expect a prototype optim package from
> the trunk?

Do you mean the implementation of the "fluent" interface?

>
> My current work
>
> As I wrote before, I am developping implementation of different
> gradient based optim methods not available yet with 3.2.
>
> To do so I also implemented a general purpose
>
>
> abstract public class NumericallyDerivableMultivariateFunction
> implements MultivariateFunction {
>  ...
>
> This class impement the gradient and hessian matrix based on finite
> differences
>
> If it can be of any use

You could certainly start a discussion on the "dev" ML about it,
together with examples of the proposed usage, then when we reach
some agreement on the design, provide code and unit tests.


Thanks,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message