commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [Math] Cleaning up the curve fitters
Date Fri, 19 Jul 2013 17:21:54 GMT
As I said above, let's focus on actual technical discussion here. 
We implement standard, well-documented algorithms.  We need to
provide references and convince ourselves that what we release is
numerically sound, well-documented and well-tested.  We do our best
with the volunteer resources we have.  Your help and contributions
are appreciated.

Phil

On 7/19/13 9:44 AM, Ajo Fod wrote:
> Hi,
>
> I very much appreciate the work that has been done in CM and this is
> precisely why I'd like more people to contribute. Even when you didnt'
> accept my MATH-995 patch, I got useful input from Konstantin and it has
> already made my application more efficient.
>
> What you required of me in the Improper integral example was a comparison
> of different methods. This sort of research takes time. I hear that Gilles
> is working on it. I appreciate that you guys spent so much effort on this.
>
> However, my contention is that your efforts at researching alternate
> solutions to a patch is not justified till you come up with a test that the
> patch fails OR if you know an alternate performs better for an application
> you have. In the first case, you're losing the efficiency of open source by
> reinventing a possibly different wheel without sufficient marginal reward.
> In the second case, beware of the fact that numerical algorithms are hairy
> beasts, and it takes a while to encode something new. The efficiency of
> commons comes from putting the burden of development on the developers who
> need the code.
>
> So, I propose an alternate approach to testing if a submitted patch needs
> to be accepted:
> 1. Check if the patch fills a gap in existing CM code
> 2. if so, check if it passes known tests
> 3. if so, write up alternate tests to see if the code breaks.
> 4. if so, wrap the code up in a suitable API and accept the patch
>
> This has two advantages. First CM will have more capabilities per unit of
> your precious time. Second you give people the feeling that they are making
> a difference.
>
> As far as the debate on AQ(AdaptiveQuadrature) vs
> LGQ(IterativeLegendreGaussIntegrator) goes:
> The FACTS that support AQ over LGQ are:
> 1. An example where LGQ failed and AQ succeeded. I also explained why LGQ
> fails and AQ will probably converge more correctly. Generally adaptive
> quadrature are known to be so succesful at integration that Konstantin even
> wondered why we don't have something yet.
> 2. Efficiency improvement: I also showed that LGQ is more efficient at at
> least one example in terms of accuracy in digits per function evaluation.
> So, conversely, its now your turn to provide concrete examples where LGQ
> does better than AQ. You could pose credible objections by providing
> examples where:
> 1. AQ fails but LGQ passes.
> 2. LGQ is more efficient in accuracy per evaluation.
>
> All that to illustrate with example where the perception that it is hard to
> convince the gatekeepers of commons of the merits of a patch arises from. I
> have a package in my codebase with assorted patches that I just dont' think
> is worth the time to try to post to commons. I think it is very inefficient
> if others have such private patches.
>
> Cheers,
> Ajo
>
>
>
>
>
>
>
> On Thu, Jul 18, 2013 at 2:15 PM, Phil Steitz <phil.steitz@gmail.com> wrote:
>
>> On 7/18/13 1:48 PM, Ajo Fod wrote:
>>> Hello folks,
>>>
>>> There is a lot of work in API design. However, Konstantin's point is that
>>> it takes a lot of effort to convince Gilles of any alternatives. API
>> design
>>> issues should really be second to functionality. This idea seems to be
>> lost
>>> in conversations.
>> With patience and collaboration you can have both and we *need* to
>> have both.  You can't get to a stable API and approachable and
>> maintainable code base without thinking carefully about API design.
>>> I agree with Gilles that providing tests and benchmarks that exhibit the
>>> advantages of a particular method are probably the best way to show other
>>> contributors the value of an alternative approach.
>> There is some value to this, but honestly much more value in
>> carefully researching and presenting the numerical analysis to
>> support improvement / performance claims.
>>> It is quite depressing to the contributor to see one's contribution be
>>> rejected when efficiency/accuracy improvements are demonstrated.
>> What you "demonstrated" in one case was better performance in one
>> problem instance.  The change of variable approach you implemented
>> was, in my admittedly possibly naive numerics view, questionable.  I
>> asked to see numerical analysis support and no one provided that.
>> Had you provided that, I would have argued to include some version
>> of the patch.
>>
>>> In a
>>> better world, rejecting a patch that passes the hurdle of demonstrating
>> an
>>> efficiency improvement over existing code should come with a
>> responsibility
>>> of showing alternate tests that the patch fails and the original code
>>> passes. Otherwise, the patch should be accepted by default. The person
>> who
>>> commits or designed the API is free to make changes to fit API design.
>> This is essentially what Gilles ended up doing.  You may not agree
>> with the approach, but he did in fact address the core issue.
>>> Just like API designers are not experts at the underlying math,
>>> contributors are not necessarily experts at the underlying API design. To
>>> unlock the efficiency of open source, contributor morale needs to be
>>> considered and classes that pass tests should really be accepted.
>> I agree that we should try to be friendly and encouraging and I
>> apologize if we have not been so.  That said, the process of
>> contributing here is not just tossing patches over the wall.  First
>> you need to get community support for the ideas.  Then work
>> collaboratively to get patches that work for the code and community.
>>> For example, Performance AND accuracy improvements to existing algorithm
>>> were demonstrated for AdaptiveQuadrature in my patches to MATH-995
>> Sorry, I was not convinced by the accuracy and performance claims
>> and, as I said above, I suspect that the change of variable approach
>> may not be the best way to handle improper integrals.  I am not
>> claiming authority here - just - again - asking for real numerical
>> analysis arguments to support the claims you are making.
>>
>> It would be a lot better if we focused discussion on the actual
>> technical issues and mathematical principles rather than
>> generalities about how hard / easy it is to get stuff in.
>>
>> Phil
>>> The only joy I got out of that was Gilles putting a comment in the docs
>> for
>>> the existing class:
>>> "The Javadoc now draws attention that the [existing] algorithm is not
>> 100%
>>> fool-proof."!
>>> Also, I was asked to open a new issue about Adaptive Quadratures to
>> figure
>>> out what is the best quadratue method ... all while a patch that is a
>> clear
>>> improvement over existing code wastes away. Why not accept the patch and
>>> make improvements as necessary?
>>>
>>> My impression since that patch was rejected, is that it just seems like a
>>> huge hurdle to get any patch past the API design requirements that are
>>> frankly not as clear to me as it is to the designer. I can see how others
>>> feel the same way.
>>>
>>> Cheers,
>>> Ajo.
>>>
>>> Gilles: if you don't want to end up spending time developing
>> Gauss-Hermite
>>> quadrature or something else you don't really need, perhaps you should
>>> consider accepting/modifying code that was shown to work by someone who
>>> needed that functionality. It is reasonable to develop alternatives to
>> fix
>>> flaws/gaps, but otherwise its your effort wasted.  If someone's
>>> contribution doesn't fit your view of the API, then by all means edit the
>>> patch, but if you go about rejecting things that work, there won't be as
>>> many contributors to CM.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 18, 2013 at 10:08 AM, Roger L. Whitcomb <
>>> Roger.Whitcomb@actian.com> wrote:
>>>
>>>> As an outsider listening to these discussions, it seems like:
>>>> a) *IF* there are problems with the current arrangement of packages,
>> APIs,
>>>> or whatever, then a constructive approach would be for the one who sees
>>>> such problems to take the time to not just criticize and point out
>> "flaws",
>>>> but to dig in and rearrange the packages, redo the APIs, provide unit
>>>> tests, and submit a patch with these changes, along with quantitative
>>>> justification, benchmarks, test cases, etc.  It is quite easy to
>> criticize,
>>>> from the sidelines, the one who is actually doing the work, but quite
>>>> another matter to roll up your sleeves and join in the work....
>>>> b) Since Math is a "library", it seems like there needs to be
>>>> implementations of many different algorithms, since (quite clearly) not
>>>> every algorithm is suited to every problem.  To say that X method
>> doesn't
>>>> work well for problem Y, is not necessarily a reason to rewrite X
>> method,
>>>> if that method is correctly implementing the algorithm.  Maybe the
>>>> algorithm is simply not the right one to use for the problem.
>>>> c) Comments that imply (or state outright) that someone who has
>> (clearly)
>>>> done a lot of work has done it "...without much thinking..." are clearly
>>>> out of line.  In my experience, the only reason to resort to name
>> calling
>>>> and character assassination is because one has no worthy arguments to
>> put
>>>> forward.
>>>> d) Kudos to the Commons committers who have been doing the work ...
>>>>
>>>> My 2 cents...
>>>>
>>>> ~Roger Whitcomb
>>>> Apache Pivot PMC Chair
>>>>
>>>> -----Original Message-----
>>>> From: Gilles [mailto:gilles@harfang.homelinux.org]
>>>> Sent: Thursday, July 18, 2013 9:35 AM
>>>> To: dev@commons.apache.org
>>>> Subject: Re: [Math] Cleaning up the curve fitters
>>>>
>>>> On Thu, 18 Jul 2013 11:47:03 -0400, Konstantin Berlin wrote:
>>>>> I appreciate the comment. I would like to help, but currently my
>>>>> schedule is full. Maybe towards the end of the year.
>>>>>
>>>>> I think the first approach should be do no harm. The optimization
>>>>> package keeps getting refactored every few months without much
>>>>> thinking involved. We had the discuss previously, with Gilles
>>>>> unilaterally deciding on the current tree, which he now wants to
>>>>> change again.
>>>> As I said,
>>>> as Luc said,
>>>> as Phil said,
>>>> again and again and again,
>>>> we are not optimization (as a scientific field) experts here, but we do
>>>> use Commons Math in scientific code that is pretty compute intensive
>> (and
>>>> yes, maybe not in the same sense as you'd like it to be for your
>> comfort).
>>>> Current code has, and may still have problems, but we see them only
>>>> through running unit tests, running our applications, running code
>> examples
>>>> submitted by issue reporters.
>>>> We improve what we can, given time and motivation constraints.
>>>> Other than that, there is nothing.
>>>>
>>>> Yes, we already had that asymmetrical conversation where _you_ declare
>>>> what _we_ should do.
>>>>
>>>>> As someone who uses optimization regular I would say the current API
>>>>> state (not just package naming) leaves a lot to be desired, and is not
>>>>> amenable to the various modification that people might need for larger
>>>>> problems. So if you are going to modify it, you should at least open
>>>>> up the API to the possibility that different optimization steps can be
>>>>> done using various techniques, depending on the problem.
>>>>>
>>>>> We should also accept that not everything can fit neatly into a
>>>>> package tree and a single set of APIs. A good example is least
>>>>> squares. Linear least squares does not require an initial guess at a
>>>>> solution, and by performing decomposition ahead of time you can
>>>>> quickly recompute the solution given different input values. However,
>>>>> an iterative least squares method might not have these properties.
>>>>> There are probably countless of other examples.
>>>>>
>>>>> Because optimization problems are really computationally hard all the
>>>>> little specific differences matter, that is why Gilles approach of
>>>>> sweeping everything under the rug and into some rigid not thought out
>>>>> hierarchical API forces these methods to adapt (or drop) numerical
>>>>> aspects that should not be there (e.x. polynomial fits). This has
>>>>> *huge* performance implications, but the issue is treated as some OO
>>>>> design 101 class, with the focus on how to force everything into a
>>>>> simple inheritance structure, numerics be damned.
>>>>>
>>>>> I would gladly help with the feedback when I can. Ajo and I provided
>>>>> code for adaptive integration, yet the whole issue was completely
>>>>> ignored. So I am not sure how much effort is required for the
>>>>> developers to take an idea or mostly completed code and make a change,
>>>>> rather than reject even the most basic numerical approaches that are
>>>>> taught in introduction classes as something that needs to be
>>>>> benchmarked.
>>>> As usual, you are mixing everything, from algorithms to implementations,
>>>> from proposing new features to denigrating existing ones (with
>> non-existent
>>>> or inappropriate use-cases), from numerical to efficiency
>> considerations...
>>>> [On top of it, you blatantly affirm that this issue has been ignored,
>> even
>>>> as I provided[1] an analysis[2] of what was actually happening.
>>>> People like you seem to ignore that we work benevolently on this
>> project!]
>>>> Not even speaking of derogatory remarks like "sweeping [...] under the
>> rug"
>>>> and "not thought out" and insinuating that everything was better and
>> more
>>>> efficient before. Which is simply not true.
>>>>
>>>> It's an asymmetrical discussion because you declare that half-baked code
>>>> is good enough and _we_ have to work even more than if we'd have to
>>>> implement the feature from scratch.
>>>>
>>>>
>>>> Gilles
>>>>
>>>> [1] In the spare time I do _not_ have either.
>>>> [2] Which dragged me to the implementation of the Gauss-Hermite
>> quadrature
>>>>      scheme (although I had no personal use of it), which seems to be
>> the
>>>>      appropriate way to deal with the improper integral reported in the
>>>>      issue which you refer to.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message