commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <...@spaceroots.org>
Subject [math] fluent/builder API for algorithms in 4.0 ?
Date Mon, 29 Apr 2013 16:40:52 GMT
Hi all,

Since 2.x series, we have been struggling in several areas with respect
to algorithms API. The latest change was about optimizer, but it is only
one example among others (solvers, integration, ODE and maybe some parts
of statistics may be concerned by the proposal below).

The various things we want to keep and which are not always compatible
with each others are :

 1) simple use
 2) immutability
 3) good OO design
 4) compatible with reference algorithms implementations
 5) maintainable
 6) extensible
 7) backward compatibility
 8) probably many other characteristics ...

3) and 4) often don't work together. 1) 6) and 7) are difficult to
handle at once.

If we look at optimizers, some progress have been with optimizers with
respect to extensibility and backward compatibility, but simple use was
clearly left behind as it is difficult to know which optimizer support
which feature as neither strong typing nor fixed arguments are used
anymore. However, keeping the older API would have prevented
extensibility as the combinatorial explosion of arguments increases as
features are added (and we still need to add several constraints types).

If we look at ODE solvers, we are still using the original API from
mantissa, but when we add a new feature, we add more and more setters,
thus going farther and farther from immutability, and imposing some
unwritten scheduling between calls (for example when we set up
additional equations, we must also set up the initial additional state,
and the user must set up a way to retrieve the final additional state).

If we look at solvers, we started with some parameters set up during the
call to solve while other were set up at construction time, but this
repartition has changed along time.

So I would like to suggest a new approach, which has been largely
inspired by a recent discussion on the [CSV] component about the builder
API (see <http://markmail.org/thread/o3s2a5hyj6xh4nzj>), by an older
discussion on [math] about using fluen API for vectors (see
<http://markmail.org/message/2gmg6wnpm5p2splb>), and by a talk Simone
gave last year at ApacheCon Europe. The idea is to use fluent API to
build progressively the algorithm adding features one at a time using
withXxx methods defined in interfaces.

As an example, consider just a few features used in optimization:
constraints, iteration limit, evaluations limits, search interval,
bracketing steps ... Some features are used in several optimizers, some
are specific to univariate solvers, some can be used in a family of
solvers ... Trying to fit everything in a single class hierarchy is
impossible. We tried, but I don't think we succeeded.

If we consider separately each features, we could have interfaces
defined for each one as follows:

  interface Constrainable<T extends Constrainable<T>>
            extends Optimizer {
    /** Returns a new optimizer, handling an additional constraint.
      * @param c the constraint to add
      * @return a new optimizer handling the constraint
      * (note that the instance itself is not changed
      */
    T withConstraint(Constraint c);
  }

Basically they would  be used where OptimizationData is used today.
An optimizer that supports simple bounds constraints and max iterations
would be defined as :

  public class TheOptimizer
         implements Optimizer,
                    Constrainable<TheOptimizer>,
                    IterationLimited<TheOptimizer> {

    private final int maxIter;
    private final List<Constraint> constraints;

    // internal constructor used for fluent API
    private TheOptimizer(..., int maxIter, List<Constraint> list) {
      ...
      this.maxIter     = m;
      this.constraints = l;
    }

    public TheOptimizer withConstraint(Constraint c) {
      List<Constraint> l = new ArrayList<Constraint>(constraints);
      l.add(c);
      return new TheOptimizer(..., maxIter, l);
    }

    public TheOptimizer withMaxIter(int maxIter m) {
      return new TheOptimizer(..., m, constraints);
    }

 }

So basically, the withXxx are sort-of setters, but they do preserve
immutability (we do not return this, we return a new object). It is easy
to add features to existing classes and there is no need to shove
everythin within a single hierarchy, we have a forest, not a tree. When
looking at the API, users clearly see what the can use and what they
cannot use: if an optimizer does not support constraint, there will be
no way to put a constraint into it. If in a later version constraints
become available, the existing functions will not be changed, only new
functions will appear.

Of course, this creates a bunch of intermediate objects, but they are
often quite small and the setting part is not the most
computation-intensive one. It becomes also possible to do some
parametric studies on some features, using code like:

  Algorithm core = new Algorithm().withA(a).withB(b).withC(c);
  for (double d = 0.0; d < 1.0; d += 0.001) {
    Algorithm dSpecial = core.withD(d);
    double result = dSpecial.run();
    System.out.println(" d = " + d + ", result = " + result);
  }

This would work for someone considering feature A is a core feature that
should be fixed but feature D is a parameter, but this would equally
well work for someone considering the opposite case: they will simply
write the loop the other way, the call to withD being outside of the
loop and the call to withA being insided the loop.

A side effect is also that it becomes possible to copy safely algorithms
by just resetting a feature, even when we don't really know what
implementation we have. A typical example I have that creates problems
to me is duplicating an ODE solver. It cannot be done currently, as some
specific elements are required at construction time that depend on the
exact type of solver you use (tolerance vectors for adaptive stepsize
integrators). So if for example I want to do some Monte-Carlo analysis
in parallel and need to duplicate an integrator,
I would do it as follows:

  void FirstOrderIntegrator[]
      duplicate(FirstOrderIntegrator integrator, int n) {
    FirstOrderIntegrator copies = new FirstOrderIntegrator[n];
    for (int i = 0; i < n; ++i) {
      copies[i] =
        integrator.withMaxEvaluations(integrator.getMaxEvaluations());
    }
    return copies;
  }

This kind of API could be extended to several algorithms, so it may be
set up in a consistend way accross the library. As I wrote at the
beginning of this message, I first think about root solvers, optimizers
and ODE.

What do you think?
Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message