commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] Java version
Date Fri, 16 Jan 2015 14:47:58 GMT
On Fri, 16 Jan 2015 10:09:02 +0100, Thomas Neidhart wrote:
> On 01/16/2015 01:30 AM, Gilles wrote:
>> On Thu, 15 Jan 2015 15:41:11 -0700, Phil Steitz wrote:
>>> On 1/15/15 2:24 PM, Thomas Neidhart wrote:
>>>> On 01/08/2015 12:34 PM, Gilles wrote:
>>>>> Hi.
>>>>>
>>>>> Raising this issue once again.
>>>>> Are we going to upgrade the requirement for the next major 
>>>>> release?
>>>>>
>>>>   [ ] Java 5
>>>>   [x] Java 6
>>>>   [x] Java 7
>>>>   [ ] Java 8
>>>>   [ ] Java 9
>>>>
>>>> A while ago I thought that it would be cool to switch to Java 7/8 
>>>> for
>>>> some of the nice new features (mainly fork/join, lambda 
>>>> expressions and
>>>> diamond operator, the rest is more or less unimportant for math 
>>>> imho).
>>>>
>>>> But after some thoughts I think they are not really needed for the
>>>> following reasons:
>>>>
>>>>  * the main focus of math is on developing high-quality, well 
>>>> tested and
>>>> documented algorithms, the existing language features are more 
>>>> than
>>>> enough for this
>>
>> Sure.
>> Not so long ago, some people were claiming that nothing beats
>> programming in "assembly" language.
>>
>>> +1
>>>>
>>>>  * coming up with multi-threaded algorithms might be appealing but 
>>>> it is
>>>> also hard work and I wonder if it really makes sense in the times 
>>>> of
>>>> projects like mahout / hadoop / ... which aim for even better
>>>> scalability
>>>
>>> +1
>>
>> Hard work / easy work.  Yes and no.  It depends on the motivation
>> of the contributor. Or we have to (re)define clearly the scope of
>> CM, and start some serious clean-up.
>> It's not all black or white; I'm quite convinced that it's better
>> to handle multi-threading externally when the core computation is
>> sequential.  But CM already contains algorithms that are inherently
>> parallel (a.o. genetic algorithms) and improvement in those areas
>> would undoubtedly benefit from (internal) parallel processing.
>
> I think the better approach is to support external parallelization
> rather than trying to do it yourself. From a user POV, I would be 
> scared
> to use a library that does some kind of parallelization internally 
> which
> I can not control.
>
> Some recent examples show how it can be done better: there were some
> requests to make some of the statistics related classes map/reducable 
> so
> that they can be used in Java 8 parallel streams.
>
> @genetic algorithms: there are far more better libraries out there 
> for
> this area and the support we have in math is really very simplistic. 
> You
> can basically do just a few demo examples with it and I am more in 
> favor
> to deprecate the package.

I pointed that out quite some time ago, but the deprecation idea was
outwardly rejected. [And further work was done on the package.]
This is IMO a major problem with CM: too many things are kept even
though there are no known users.
No user = no real-world testing = no improvement

>>> My HO is we should focus on getting the best single-threaded
>>> implementations we can and, where possible, setting things up to be
>>> executed in parallel by other engines.  Spawning and managing
>>> threads internal to [math] actually *reduces* the range of
>>> applicability of our stuff.
>>
>> Examples?
>
> because not everybody wants a library to do parallel stuff 
> internally.
> Just imagine math being used in a web-application deployed together 
> with
> many other applications. It is clearly not an option that one
> application might take over most/all of the available processors.

I agree, but this a practical problem.
Is there a inherent impossibility to find a solution?

>>>  Much better to let Hadoop / Mahout et
>>> al parallelize using fast and accurate piece parts that we can
>>> provide.
>>
>> Do they really do that?
>> [Or do they implement their own algorithms knowing that they must
>> be thread-safe (which is something we don't focus a lot on).]
>
> I guess they have mainly their own algorithms, but there are examples 
> of
> our stuff being used (using the map/reduce paradigm).

OK. Then, I would conclude that implementing the correct interface(s)
to allow this usage _must_ be among the top (yet unwritten) rules
for new contributions to, and refactoring of, CM.

>
>>>  If there are parallel algorithms that we are really dying
>>> to implement directly, I would rather see that done in a way that
>>> encapsulates and enables externalization of the thread management.
>>>>
>>>>  * staying at Java 6/7 does not block users to use math in a Java 
>>>> 8
>>>> environment if wanted
>>>
>>> +1 - the examples I have seen thus far are all things that could be
>>> done fairly easily with client code.  I know we don't all agree 
>>> with
>>> this, but I think the biggest service we can provide to our user
>>> base is good, tested, supported implementations of standard
>>> algorithms.  I wish we could find a way to focus more on that and
>>> less on fiddling with the API or language features.
>
> +1, I have the impressions that they more we try to *optimize* an API 
> we
> end up with an inferior solution (with a few exceptions).
>
> There is too much discussion about API design. We should have our 
> best
> practices and use them to implement rock-solid algorithms, which is
> already difficult enough.

I agree.

> In the end it does not matter so much if you
> have a fluent API or whatever, as long as it calculates the correct
> result, and is easy to use, imho.

I don't agree. Maybe it doesn't matter for the users (although it 
should),
but it certainly does for the developers (maintainance, etc. etc.).

[If the "form" did not matter, why do several programming languages
exist?]

>> The problem is that those discussions constantly mix considerations
>> about contents, with political moves that do not necessarily match.
>> For example, a statement about contents would be: CM only provides
>> implementations of sequential mathematical algorithms.
>> But recent political moves, like changing the version control system
>> or advertizing "free for all" commit rights, aim at increasing the
>> contributor base.
>
> I think these considerations are orthogonal:

It would be so easy if it were true, but there are interactions...

>
>  * what you want to do? aka scope of the projects
>  * how you want to do it?
>  * what infrastructure do you provide to your users/collaborators

I try to point to that the stated goal of trying to gather more
contributors does not match the overly cautious policy with regard
to the language evolution.

>> What about those people interested in API fixing and new language
>> features?  You'll make them want to contribute to another project.
>> Now that Java is, at last, beginning to catch up with other
>> languages incomparably more widely used in the scientific community,
>> Commons Math is discussing how far behind it is going to lag!
>
> Afaik the scientific community uses mainly python with its abundance 
> of
> great tools. I think Java is better suited in an engineering context.

That's a digression.

The point is to find the right balance (make users happy, make 
developers
not too unhappy). But we must have facts to help determine a real 
balance,
not just a balance between opinions (which is unlikely to happen).
For example, I'd propose that we advertize a poll with several precise
questions, to collect a statistics on various aspects that can 
influence
a roadmap, like:
  What package(s) of CM are you directly "import"ing in your 
applications?
  Which Java version are you using to develop applications that use CM?
  Are you going to upgrade your applications with each new release of 
CM?
  What do you miss most in CM?
  etc.

Short of doing it seriously, we might as well skip the divination
part. [That, IMHO, prevents CM from making progress (even through
mistakes, that's fine).]


Gilles


>
> Thomas
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message