commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Ward <>
Subject Re: [Math] Fluent API, inheritance and immutability
Date Mon, 12 Aug 2013 17:55:41 GMT

On 08/12/2013 12:19 PM, Phil Steitz wrote:
> On 8/12/13 9:04 AM, Evan Ward wrote:
>> On 08/09/2013 06:55 PM, Gilles wrote:
>>>>> [...]
>>>>>> This does make it important to decide on a well written and
>>>>>> complete API before releasing it.
>>>>> When the scope of the software is well circumscribed, that would be
>>>>> possible. With the whole of [Math]ematics, much less so. :-}
>>>>> And state-of-the-art in Java is a moving target, aimed at by changing
>>>>> CM contributors with differing needs and tastes; this adds to the
>>>>> unstable mix.
>>>> That's a good point. I still prefer the interface design (though I may
>>>> be in the minority) for two reasons. First, if a concrete class only
>>>> publicly exposes the methods defined in an interface it encourages
>>>> polymorphism. User code that uses one implementation can be easily
>>>> switched to another and new implementations are less constrained.
>>>> Second, it encourages composition over inheritance. I agree with Josh
>>>> Bloch that composition produces more maintainable code. Adding new
>>>> methods to an existing interface/class breaks composition.
>>> The "problem" is to get the interface right. As it happens, at some point
>>> we discover something that was not foreseen; and to correct/improve the
>>> design, compatibility must be broken.
>>> [But refactoring is not a failure of development; it's part of it.]
>>>> I think the interface/abstract class discussion is partially separable
>>>> from the immutable/mutable discussion. I see the algorithm as the part
>>>> that could really benefit from the polymorphism. Perhaps separating the
>>>> problem definition (data) from the algorithm will improve the
>>>> flexibility of the API. For example,
>>>> PointVectorValuePair solveMyNLLSProblem(NLLSOptimizer opt){
>>>>     //define problem to solve in an independent object
>>>>     NLLSProblem p = new NLLSProblem(/*model functions, weights,
>>>> convergence checker, ...*/);
>>>>     //provide algorithm with the data it needs
>>>>     //algorithm has no problem specific state
>>>>     return opt.optimize(p);
>>>> }
>>> I may be missing something, but how much better is it to store
>>> everything the optimizer needs in yet another class?
>>> [Then, that's a possible approach, but it's not what we started
>>> from in Commons Math, and when trying to fix some inconsistency
>>> or removing duplicate code, I tried to retain what could be from
>>> the existing design.]
>> I've looked at the implementations of GN and LM and it seems that the
>> abstract classes are primarily concerned with evaluating the model, and
>> the concrete classes use the evaluation to compute the next step. I
>> think separating those two concerns could simplify the implementation of
>> the optimizers. The NLLSProblem class would provide methods to evaluate
>> the weighted Jacobian, residuals, etc. The concrete class would look
>> almost the same as they do today, except calls to the parent class would
>> be replaced with calls to the NLLSProblem class.
>> The benefit is that the optimizer implementation/hierarchy would be
>> simpler and other classes could use the methods for model evaluation. I
>> don't think keeping the optimization algorithm and data in the same
>> class gains much since all the scratch space is reallocated on each call
>> to optimize. (Performance will be about the same either way.)
>>>>>> [...] Thread safety is a tricky beast. I think we agree that the
>>>>>> way to guarantee thread safety is to only depend on final concrete
>>>>>> classes that are thread safe themselves.
>>>>> I don't think so. When objects are immutable, thread-safety follows
>>>> It is somewhat off topic, but a counter example would be Vector3D. Since
>>>> the class is not final, a user could extend it and override all the
>>>> methods and add some set{X,Y,Z} methods to make it mutable. Even though
>>>> Vector3D is immutable, there is no _guarantee_ that every instanceof
>>>> Vector3D is immutable unless it is final. This is why String is final.
>>> I think I don't get your point: If someone extends a class that is safe
>>> in a way that the extension is unsafe, that's his problem. ;-)
>>>>>> [...] copying any large matrices or arrays is prohibitively
>>>>>> expensive. For the NLLS package we would be copying a pointer to
>>>>>> function that can generate a large matrix. I think adding some
>>>>>> documentation that functions should be thread safe if you want to
>>>>>> them from multiple threads would be sufficient.
>>>>> I you pass a "pointer" (i.e. a "reference" in Java), all bets are off:
>>>>> the
>>>>> class is not inherently thread-safe. That's why I suggested to
>>>>> mandate a
>>>>> _deep_ "copy" method (with a stringent contract that should allow a
>>>>> caller
>>>>> to be sure that all objects owned by an instance are disconnected from
>>>>> any
>>>>> other objects).
>>>> As someone who has designed a thread safe application based on deep
>>>> copying I don't think this is route to follow. A deep copy means you
>>>> have to be able to copy an arbitrary (possibly cyclical) reference
>>>> graph. Without the graph copy there are many subtle bugs. (References to
>>>> the same object are now references to different objects.) With the graph
>>>> copy the implementation is very complex. This is the reason
>>>> Serialization has a separate "patch up" step after object creation,
>>>> which leads to some nasty tricks/bugs. Similarly, Cloneable only
>>>> produces a shallow copy. Opinions may vary, but in my experience
>>>> immutability is an easier approach to thread safety, especially when you
>>>> have to depend on user code.
>>> I agree that using immutability is easier, but my point all along is that
>>> it is at odds with simplicity (which is aimed at with "fluent API").
>>> And since
>>> 1. the internals of the optimizers are not thread-safe yet (see e.g.
>>>    LevenbergMarquardtOptimizer"), and
>> Though the LM implementation looks like a mess in Java, the Fortran it
>> was translated from (thanks Luc) is purely procedural. The required
>> change would be allocating the arrays in doOptimize() as locals instead
>> of as instance variables.
>> Phil Steitz wrote:
>>>> In my use case I have a class that solves several related, but
>>>> different, NLLS problems concurrently. I would like the optimization
>>>> algorithm to be a configurable dependency (GN or LM). Currently I have a
>>>> custom (thread safe) interface with two implementations that wrap the
>>>> commons math optimizers, in order to provide thread safety and
>>>> polymorphic access to all the options that both optimizers support.
>>> Can you explain this a little more?  I am still not getting the use
>>> case requiring concurrent access to a single optimizer instance.  It
>>> appears that you have one.  It would be good to understand it better.
>> Sure. I have class that has to solve different NLLS problems
>> concurrently. The problems share the same convergence criteria, but use
>> different model functions. With the current implementation that means I
>> need one optimizer per thread.
> How bad is that, actually?  That is what I am not getting.  Why do
> you need a thread-safe facade when all you are doing is creating new
> instances per thread?  Sorry if I am missing something simple here.
> <side rant>Historically, we did not care about thread-safety at all
> in [math], assuming the standard use case was *always* going to be
> one instance per thread.  The statistics aggregators are an example
> where multithreaded access makes sense, but this is much more the
> exception than the rule in [math].  I would really like to get clear
> on which classes really need to be threadsafe themselves rather than
> blindly assuming that all do and insisting that everything be
> immutable so that we don't have to think about how to make things
> threadsafe.</side rant>
> Phil

How bad is it to use a library though wrapper classes? Well it is
annoying to have to design my own optimization API that delegates to the
C-M implementations. From the posts on the users list I think other
people were confused by the old API and probably developed their own
wrappers to insulate themselves from it. When I saw the optimization
package was being redesigned in an backward incompatible way, I thought
it was an opportunity to address a range of concerns that are easiest to
address early in the design process. Luc's original fluent + immutable
proposal would have allowed thread safety. (If not initially, it could
be implemented through patches with no change to the API.) The
separation of algorithm and data would allow each to be much simpler
while still allowing for thread safety and a fluent API. (I.e. GN would
not need 5 superclasses and fields for storing upper and lower bounds.)

With the API re-design you have the opportunity to design for thread
safety, which can be hard to add on to an existing API.

Thanks for the whole discussion; It has given me some new ideas. :)


>>  Since I don't want to hard code the
>> optimizer as a dependency (GN and LM both have their advantages) I
>> created an optimizer interface that was thread safe. Under the hood, the
>> thread safe optimize method just creates a new instance of GN or LM.
>> Immutability, a builder API, or the copy method would all allow direct
>> use of the optimizer. I've been pushing the immutable method because I
>> think it is simpler to understand its behavior. With the "separation of
>> concerns" approach described above I think the optimizer could be easily
>> made immutable. The NLLSProblem class would be easy to make immutable +
>> fluent as well since it is a concrete class. (though it could have a
>> builder or a mutable + fluent API)
>> Best Regards,
>> Evan
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message