commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Neidhart <thomas.neidh...@gmail.com>
Subject Re: [Math] Java version
Date Fri, 16 Jan 2015 09:09:02 GMT
On 01/16/2015 01:30 AM, Gilles wrote:
> On Thu, 15 Jan 2015 15:41:11 -0700, Phil Steitz wrote:
>> On 1/15/15 2:24 PM, Thomas Neidhart wrote:
>>> On 01/08/2015 12:34 PM, Gilles wrote:
>>>> Hi.
>>>>
>>>> Raising this issue once again.
>>>> Are we going to upgrade the requirement for the next major release?
>>>>
>>>   [ ] Java 5
>>>   [x] Java 6
>>>   [x] Java 7
>>>   [ ] Java 8
>>>   [ ] Java 9
>>>
>>> A while ago I thought that it would be cool to switch to Java 7/8 for
>>> some of the nice new features (mainly fork/join, lambda expressions and
>>> diamond operator, the rest is more or less unimportant for math imho).
>>>
>>> But after some thoughts I think they are not really needed for the
>>> following reasons:
>>>
>>>  * the main focus of math is on developing high-quality, well tested and
>>> documented algorithms, the existing language features are more than
>>> enough for this
> 
> Sure.
> Not so long ago, some people were claiming that nothing beats
> programming in "assembly" language.
> 
>> +1
>>>
>>>  * coming up with multi-threaded algorithms might be appealing but it is
>>> also hard work and I wonder if it really makes sense in the times of
>>> projects like mahout / hadoop / ... which aim for even better
>>> scalability
>>
>> +1
> 
> Hard work / easy work.  Yes and no.  It depends on the motivation
> of the contributor. Or we have to (re)define clearly the scope of
> CM, and start some serious clean-up.
> It's not all black or white; I'm quite convinced that it's better
> to handle multi-threading externally when the core computation is
> sequential.  But CM already contains algorithms that are inherently
> parallel (a.o. genetic algorithms) and improvement in those areas
> would undoubtedly benefit from (internal) parallel processing.

I think the better approach is to support external parallelization
rather than trying to do it yourself. From a user POV, I would be scared
to use a library that does some kind of parallelization internally which
I can not control.

Some recent examples show how it can be done better: there were some
requests to make some of the statistics related classes map/reducable so
that they can be used in Java 8 parallel streams.

@genetic algorithms: there are far more better libraries out there for
this area and the support we have in math is really very simplistic. You
can basically do just a few demo examples with it and I am more in favor
to deprecate the package.

>> My HO is we should focus on getting the best single-threaded
>> implementations we can and, where possible, setting things up to be
>> executed in parallel by other engines.  Spawning and managing
>> threads internal to [math] actually *reduces* the range of
>> applicability of our stuff.
> 
> Examples?

because not everybody wants a library to do parallel stuff internally.
Just imagine math being used in a web-application deployed together with
many other applications. It is clearly not an option that one
application might take over most/all of the available processors.

>>  Much better to let Hadoop / Mahout et
>> al parallelize using fast and accurate piece parts that we can
>> provide.
> 
> Do they really do that?
> [Or do they implement their own algorithms knowing that they must
> be thread-safe (which is something we don't focus a lot on).]

I guess they have mainly their own algorithms, but there are examples of
our stuff being used (using the map/reduce paradigm).

>>  If there are parallel algorithms that we are really dying
>> to implement directly, I would rather see that done in a way that
>> encapsulates and enables externalization of the thread management.
>>>
>>>  * staying at Java 6/7 does not block users to use math in a Java 8
>>> environment if wanted
>>
>> +1 - the examples I have seen thus far are all things that could be
>> done fairly easily with client code.  I know we don't all agree with
>> this, but I think the biggest service we can provide to our user
>> base is good, tested, supported implementations of standard
>> algorithms.  I wish we could find a way to focus more on that and
>> less on fiddling with the API or language features.

+1, I have the impressions that they more we try to *optimize* an API we
end up with an inferior solution (with a few exceptions).

There is too much discussion about API design. We should have our best
practices and use them to implement rock-solid algorithms, which is
already difficult enough. In the end it does not matter so much if you
have a fluent API or whatever, as long as it calculates the correct
result, and is easy to use, imho.

> The problem is that those discussions constantly mix considerations
> about contents, with political moves that do not necessarily match.
> For example, a statement about contents would be: CM only provides
> implementations of sequential mathematical algorithms.
> But recent political moves, like changing the version control system
> or advertizing "free for all" commit rights, aim at increasing the
> contributor base.

I think these considerations are orthogonal:

 * what you want to do? aka scope of the projects
 * how you want to do it?
 * what infrastructure do you provide to your users/collaborators

> What about those people interested in API fixing and new language
> features?  You'll make them want to contribute to another project.
> Now that Java is, at last, beginning to catch up with other
> languages incomparably more widely used in the scientific community,
> Commons Math is discussing how far behind it is going to lag!

Afaik the scientific community uses mainly python with its abundance of
great tools. I think Java is better suited in an engineering context.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message