commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [Math] Moving on or not?
Date Fri, 08 Feb 2013 19:52:24 GMT
Hi Phil,

Le 08/02/2013 15:20, Phil Steitz a écrit :
> On 2/8/13 12:04 AM, Luc Maisonobe wrote:
>> Le 08/02/2013 03:21, Konstantin Berlin a écrit :
>>> Sorry, but not of this is making sense to me. We had a long discussion
>>> about how the library doesn't test for large scale problem
>>> performance. A lot of algorithms probably do not scale well as the
>>> result. There was talk of dropping sparse support in linear algebra.
>>> So instead of fixing that, you jump to parallelization, which is
>>> needed only for large scale problems, which this library does not
>>> handle well even in single thread right now.
>>>
>>> The most significant impact you can have is fixing the linear algebra
>>> component.
>> I agree with this. Also in order to avoid spreading our attention too
>> much on keeping several branches in sync, I would suggest to not create
>> a new component but directly decide we will not support Java 5 anymore
>> as of Apache Commons Math 4.0, so people can progressively use the new
>> features of the language and experiment directly on the trunk.
> 
> Actually, to get anything, you would need to bump to 1.7, abandoning
> 1.6 as well.

I did not knew that. I don't know yet the Java 7 specificities, as the
projects I deal with are stuck to Java 6 at most.

> That would effectively mean abandoning a large segment
> (likely the majority) of the user base.  I would not like to do
> that.

I agree with this, Java 6 is important for now (but of course it will
not be true anymore in a few years). So I veto my own proposal.

> So if we don't have the energy to maintain two lines, I would
> say hold off requiring Java 7 until we have stabilized the API and
> fixed things like above.

Then I think Gilles proposal to have an experimental version is the best
way to go. This could be done exactly as was done for the BSP trees: a
sandbox component is started (Gilles can do that as any commons
committer can start a new sandbox component) and experiment with Java 7
for some parallel algorithms (which may duplicate algorithms already in
[math] or be totally different and even address other use cases).

The decision about merging [math-MT] and [math] later on can be
postponed until the component is mature enough. If we decide to merge
them, it will be a new major version that could require Java 7. If we
decide not to merge, the Java version used for [math] would be
independent (but could also be Java 7 due to other needs that may arise).

This does not mean we ignore large scale problems, it simply means we
also explore the way Gilles suggest.

Konstantin, it would be nice if you could contribute some concrete ideas
and code if possible. As you have seen, we lack contributors in several
domains and this has led us to do some mistakes. Help is welcome.

best regards,
Luc

> 
> Phil
>>
>> best regards,
>> Luc
>>
>>> On Feb 7, 2013, at 5:06 PM, Gilles <gilles@harfang.homelinux.org> wrote:
>>>
>>>> On Thu, 07 Feb 2013 08:32:46 -0800, Phil Steitz wrote:
>>>>> On 2/7/13 8:04 AM, Gilles wrote:
>>>>>> On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote:
>>>>>>> On 2/7/13 4:58 AM, Gilles wrote:
>>>>>>>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote:
>>>>>>>>> On 2/6/13 9:03 AM, Gilles wrote:
>>>>>>>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>>>>>>>>>>> On 2/5/13 6:08 AM, Gilles wrote:
>>>>>>>>>>>> Hi.
>>>>>>>>>>>>
>>>>>>>>>>>> In the thread about "static import", Stephen
noted that
>>>>>>>>>>>> decisions
>>>>>>>>>>>> on a
>>>>>>>>>>>> component's evolution are dependent on whether
the future of
>>>>>>>>>>>> the
>>>>>>>>>>>> Java
>>>>>>>>>>>> language is taken into account, or not.
>>>>>>>>>>>> A question on the same theme also arose after
the
>>>>>>>>>>>> presentation of
>>>>>>>>>>>> Commons
>>>>>>>>>>>> Math in FOSDEM 2013.
>>>>>>>>>>>>
>>>>>>>>>>>> If we assume that efficiency is among the
important
>>>>>>>>>>>> qualities for
>>>>>>>>>>>> Commons
>>>>>>>>>>>> Math, the future is to allow usage of the
tools provided by the
>>>>>>>>>>>> standard
>>>>>>>>>>>> Java library in order to ease the development
of multi-threaded
>>>>>>>>>>>> algorithms.
>>>>>>>>>>>>
>>>>>>>>>>>> Maintaining Java 1.5 source compatibility
for the reason
>>>>>>>>>>>> that we
>>>>>>>>>>>> may need
>>>>>>>>>>>> to support legacy applications will turn
out to be
>>>>>>>>>>>> self-defeating:
>>>>>>>>>>>> 1. New users will not consider Commons Math's
features that are
>>>>>>>>>>>> notably
>>>>>>>>>>>>   apt to parallel processing.
>>>>>>>>>>>> 2. Current users might at some point simply
switch to another
>>>>>>>>>>>> library if
>>>>>>>>>>>>   it proves more efficient (because it actually
uses
>>>>>>>>>>>> multi-threading).
>>>>>>>>>>>> 3. New Java developers will be turned away
because they will
>>>>>>>>>>>> want
>>>>>>>>>>>> to use
>>>>>>>>>>>>   the more convenient features of the language
in order to
>>>>>>>>>>>> provide
>>>>>>>>>>>>   potential contributions.
>>>>>>>>>>>>
>>>>>>>>>>>> If maintaining 1.5 source compatibility is
kept as a
>>>>>>>>>>>> requirement, the
>>>>>>>>>>>> consequence is that Commons Math will _become_
a legacy
>>>>>>>>>>>> library.
>>>>>>>>>>>> In that perspective, implementing/improving
algorithms for
>>>>>>>>>>>> which a
>>>>>>>>>>>> parallel version is known to be more efficient
is plainly a
>>>>>>>>>>>> waste of
>>>>>>>>>>>> development and maintenance time.
>>>>>>>>>>>>
>>>>>>>>>>>> In order to mitigate the risks (both of upgrading
and of not
>>>>>>>>>>>> upgrading
>>>>>>>>>>>> the source compatibility requirement), I
would propose to
>>>>>>>>>>>> create a
>>>>>>>>>>>> new
>>>>>>>>>>>> project (say, "Commons Math MT") where we
could implement new
>>>>>>>>>>>> features[1]
>>>>>>>>>>>> without being encumbered with the 1.5 requirement.[2]
>>>>>>>>>>>> The "Commons Math MT" would depend on "Commons
Math" where we
>>>>>>>>>>>> would
>>>>>>>>>>>> continue developing single-thread (and thread-safe)
"tasks",
>>>>>>>>>>>> i.e.
>>>>>>>>>>>> independent units of processing that could
be used in
>>>>>>>>>>>> algorithms
>>>>>>>>>>>> located in "Commons Math MT".
>>>>>>>>>>>>
>>>>>>>>>>>> In summary:
>>>>>>>>>>>> - Commons Math (as usual):
>>>>>>>>>>>>  * single-thread (sequential) algorithms,
>>>>>>>>>>>>  * (pure) Java 5,
>>>>>>>>>>>>  * no dependencies.
>>>>>>>>>>>> - Commons Math MT:
>>>>>>>>>>>>  * multi-thread (parallel) algorithms,
>>>>>>>>>>>>  * Java 7 and beyond,
>>>>>>>>>>>>  * JNI allowed,
>>>>>>>>>>>>  * dependencies allowed (jCuda).
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>> There are several other possibilities to consider:
>>>>>>>>>>>
>>>>>>>>>>> 0) Implement multithreading using JDK 1.5 primitives
>>>>>>>>>>> 1) Set things up within [math] to support parallel
execution in
>>>>>>>>>>> JDK
>>>>>>>>>>> 1.7, Hadoop or other frameworks
>>>>>>>>>>> 2) Instead of a new project, start a 4.x branch
targeting JDK
>>>>>>>>>>> 1.7
>>>>>>>>>>>
>>>>>>>>>>> I think we should maintain a version that has
no dependencies
>>>>>>>>>>> and no
>>>>>>>>>>> JNI in any case.
>>>>>>>>>>>
>>>>>>>>>>> Starting a branch and getting concrete about
how to parallelize
>>>>>>>>>>> some
>>>>>>>>>>> algorithms would be a good way to start.  One
thing I have not
>>>>>>>>>>> really investigated and would be interested in
details on is
>>>>>>>>>>> what
>>>>>>>>>>> you actually get in efficiency gain (or loss?)
using fork /
>>>>>>>>>>> join vs
>>>>>>>>>>> just using 1.5+ concurrency for the kinds of
problems we
>>>>>>>>>>> would end
>>>>>>>>>>> up using this stuff for.
>>>>>>>>>>>
>>>>>>>>>>> Thinking about specific parallelization problem
instances would
>>>>>>>>>>> also
>>>>>>>>>>> help decide whether 1) makes sense (i.e., whether
it makes
>>>>>>>>>>> sense as
>>>>>>>>>>> you mention above to maintain a single-threaded
library that
>>>>>>>>>>> provides task execution for a multithreaded version
or
>>>>>>>>>>> multithreaded
>>>>>>>>>>> frameworks).
>>>>>>>>>>>
>>>>>>>>>>> One more thing to consider is that for at least
some users of
>>>>>>>>>>> [math], having the library internally spawn threads
and/or peg
>>>>>>>>>>> multiple processors may not be desirable.  It
is a little
>>>>>>>>>>> misleading
>>>>>>>>>>> to say that multithreading is the way to get
"efficiency."
>>>>>>>>>>> It is
>>>>>>>>>>> really the way to *use* more compute resources
and unless there
>>>>>>>>>>> are
>>>>>>>>>>> real algorithmic improvements, the overall efficiency
may
>>>>>>>>>>> actually
>>>>>>>>>>> be less, due to task coordination overhead. 
What you get is
>>>>>>>>>>> faster
>>>>>>>>>>> execution due to more greedy utilization of available
cores.
>>>>>>>>>>> Actual
>>>>>>>>>>> efficiency (how much overall compute resource
it takes to
>>>>>>>>>>> complete a
>>>>>>>>>>> job) partly depends on how efficiently the coordination
>>>>>>>>>>> itself is
>>>>>>>>>>> done (which JDK 1.7 claims to do very well -
I have just not
>>>>>>>>>>> seen
>>>>>>>>>>> substantiation or any benchmarks demonstrating
this) and how the
>>>>>>>>>>> parallelization effects overall compute requirements.
 In any
>>>>>>>>>>> case,
>>>>>>>>>>> for environments where library thread-spawning
is not
>>>>>>>>>>> desirable, I
>>>>>>>>>>> think we should maintain a single-threaded version.
>>>>>>>>>> Unless I missed the point, those reasons are exactly
why I
>>>>>>>>>> propose to
>>>>>>>>>> have 2 projects/components. One, "Commons-Math",
does not fiddle
>>>>>>>>>> with
>>>>>>>>>> resources, while the other would provide a "parallelizationLevel"
>>>>>>>>>> setting for the algorithms written to possibly take
advantage of
>>>>>>>>>> the
>>>>>>>>>> Java 5+ "task framework".
>>>>>>>>> OK, what about the 4.x option?
>>>>>>>>>> Yes, we could still be good by using only Java 5's
concurrency
>>>>>>>>>> features
>>>>>>>>>> but the issue I raise is not only about concurrency
but about
>>>>>>>>>> evolution/progress/maintenance, all things that require
raising
>>>>>>>>>> interest
>>>>>>>>>> from new contributors (unless it's fine that Commons
Math be
>>>>>>>>>> tagged as a
>>>>>>>>>> "library of the past"...).
>>>>>>>>> +1 for experimenting with parallelization.  I would just
like to
>>>>>>>>> understand if the JDK 7 stuff really adds much - in particular,
>>>>>>>>> does
>>>>>>>>> it handle coordination / cpu allocation better than you
could
>>>>>>>>> easily
>>>>>>>>> do it with 1.5.  More supported JDKs == more potential
users, so I
>>>>>>>>> like to see a real reason to bump the JDK level.
>>>>>>>>>> But using concurrency features in "Commons Math"
would also
>>>>>>>>>> contradict
>>>>>>>>>> your own point ("we should maintain a single-threaded
>>>>>>>>>> version"): I
>>>>>>>>>> agree,
>>>>>>>>>> and that's why I proposed this other project...
>>>>>>>>>>
>>>>>>>>>> As for efficiency (or faster execution, if you want),
I don't
>>>>>>>>>> see the
>>>>>>>>>> point in doubting that tasks like global search (e.g.
in a
>>>>>>>>>> genetic
>>>>>>>>>> algorithm) will complete in less time when run in
parallel...
>>>>>>>>>>
>>>>>>>>>> As I summarized previously, having a "Commons Math
MT" would
>>>>>>>>>> bring no
>>>>>>>>>> inconvenience, contrary to either your points 0,
1, or 2. [No
>>>>>>>>>> inconvenience to me, that is, but to people with
requirements
>>>>>>>>>> like
>>>>>>>>>> "Java 5 compatible" or "no multi-threading").
>>>>>>>>>> As I indicated, the basic "task" could be defined
in "Commons
>>>>>>>>>> Math" and
>>>>>>>>>> "Commons Math MT" would provide the parallelization
"glue" (e.g.
>>>>>>>>>> to divide
>>>>>>>>>> the search space of the GA).
>>>>>>>>> I think it is best at this point to cut a branch and
actually
>>>>>>>>> start
>>>>>>>>> working on specific algorithms.  Having a set of candidate
>>>>>>>>> algorithms for parallelization will help us decide what
we
>>>>>>>>> actually
>>>>>>>>> need and how it might work.  I would personally favor
the 4.x
>>>>>>>>> approach, with thread-spawning behavior configurable.
>>>>>>>> It seems fair to wait until parallel algorithms are actually
>>>>>>>> implemented.
>>>>>>>>
>>>>>>>> However it is not clear what you mean with "the 4.x approach":
if
>>>>>>>> it is
>>>>>>>> actually allowing Java 7, that would mean that, starting
from 4.0,
>>>>>>>> we'll
>>>>>>>> indeed drop support of earlier JVMs!
>>>>>>>> Why would this be preferred to having 2 projects? Of course,
if
>>>>>>>> everyone
>>>>>>>> agrees to that move to Java 7, that's fine. :-)
>>>>>>> What I meant was that instead of creating a new component, we
would
>>>>>>> just create a new release line.  Like what tomcat does for servlet
>>>>>>> spec versions.  I guess this does mean that we end up having
to
>>>>>>> stabilize the 3.x APIs because no additional "major" release
would
>>>>>>> be allowed in that line.  That would be a *good thing* IMO as
long
>>>>>>> as we can do it cleanly.  If not, maybe we end up having to use
5.x
>>>>>>> for the JDK 1.7+ version, using 4.0 to get to a stable API for
the
>>>>>>> current trunk code.
>>>>>> There's a still the human resource problem: we don't have it to
>>>>>> maintain
>>>>>> a single branch; having two will only make it worse.
>>>>> Yes, but the "new project" approach has the same problem.
>>>> Yes.
>>>> However, I meant it as a way to separate concerns, as shown
>>>> by diverging opinions, even in the few people who take part
>>>> in this discussion or in previous ones about the same subject.
>>>>
>>>> A sibling (not separate!) project could allow interested
>>>> people to experiment while not adding yet another "distraction"
>>>> to the main project, where people more focused on the
>>>> mathematical (for lack of a better word) side can continue
>>>> their own improvements.
>>>> A healthy interaction could even come out of having a "public"
>>>> use-case in the form of a project that needs certain facilities
>>>> (algorithms as tasks) in order to provide multi-thread
>>>> utilities to users (who might prefer not to have to implement
>>>> them themselves at a higher level).
>>>>
>>>>>>>> On the other hand, if we keep Java 5, at least until we get
use
>>>>>>>> cases or
>>>>>>>> contributions that would benefit from features in JDKs newer
than
>>>>>>>> 1.5,
>>>>>>>> there is no need to create a branch; we can just go on with
adding
>>>>>>>> multi-thread codes to the trunk (to become part[1] of the
upcoming
>>>>>>>> 3.x
>>>>>>>> releases).
>>>>>>> That is why I wanted to get a feel for what the JDK 1.7 stuff
really
>>>>>>> buys you.   Has anyone seen benchmarks showing better performance
>>>>>>> using 1.7 than can be obtained just using 1.5 concurrency
>>>>>>> primitives?
>>>>>> Again, there are separate issues:
>>>>>> 1. Coding in Java 7
>>>>>> 2. Running with the JVM shipped with JDK 1.7
>>>>>>
>>>>>> The newer JVMs are faster, independently of whether new features
>>>>>> of the
>>>>>> language are used.
>>>>>> But it could well be that some of the new features allow even better
>>>>>> performance (as is foreseen for Java 8).
>>>>> Agreed.  I am interested in understanding better both how much
>>>>> easier it actually is to code and whether the 1.7 framework
>>>>> materially improves scheduling / allocation over what you could do
>>>>> just using 1.5 primitives.
>>>> I cannot provide proof, but nor is anyone on this list
>>>> eager to prove the contrary; hence the proposal to set
>>>> up a "playground".
>>>>
>>>>>>> Has anyone used 1.7 to parallelize numerical algorithms
>>>>>>> and found it really easier / more performant?
>>>>>> Where are those people who could answer?
>>>>> This is a public list :)
>>>>>> That is one of the points I raised. If we maintain source
>>>>>> compatibility
>>>>>> with a language version that is 9 years old, not many contributors
>>>>>> are
>>>>>> going to be interested. Thus reducing the chance to get answers...
>>>>>>
>>>>>>> Any opinions /
>>>>>>> responses to Konstantin's comment on where parallelization should
be
>>>>>>> implemented - i.e. in the library vs somewhere up the stack?
>>>>>> What was the _question_?  ...
>>>>> The question he implicitly raised was whether or not it makes sense
>>>>> for a low-level library to parallelize tasks / run across cores.
>>>> In several areas, CM is not a low-level library (GA, multi-start
>>>> optimizers for example). In other areas like FFT, a user can
>>>> legitimately expect top performance without having to handle
>>>> parallelization by himself.
>>>>
>>>>> This is a legitimate question.  It may be better actually to set
>>>>> things up so that higher-level frameworks or applications can
>>>>> arrange parallel execution rather than embedding it in the low-level
>>>>> library itself.  This is also what I was referring to when I said
>>>>> that in some contexts, thread-spawning / cpu hogging may not be
>>>>> desirable.
>>>> For several cases (GA, FFT, multi-start optimizers), I have the
>>>> opposite viewpoint: multi-threading is a implementation detail,
>>>> that could be handled at a _lower_ level. Of course, the user can
>>>> decide whether to enable more than one thread.
>>>>
>>>>>>> Any
>>>>>>> ideas how to set things up so that [math] code can play nicely
with
>>>>>>> concurrency frameworks?
>>>>>> That's a strange question in the context of a project that tries
hard
>>>>>> not to have any dependency.
>>>>> I did not mean necessarily to bring in dependencies; but rather to
>>>>> make it easy for computational tasks executed by [math] code to be
>>>>> managed by external concurrency frameworks, e.g. Hadoop.
>>>> In the context of Commons Math, we often heard that "no dependency"
>>>> is good. Then, it is also good to not impose _implicit_ dependencies
>>>> (like: "If you use Hadoop, you could have better performance"). In a
>>>> way, the CM development "model" is: "We provide a toolkit of efficient
>>>> procedures, and you, the user, get top performance (on a best effort
>>>> basis of course)."
>>>> If we can provide better performance through multi-threading, why not?
>>>> Nobody will be forced to use it: they will use the "basic" (sequential)
>>>> tasks, or set the "parallelizationLevel" setting to 1.
>>>>
>>>> Gilles
>>>>
>>>>> Phil
>>>>>> If the requirement is to only depend on the standard JDK: the
>>>>>> framework
>>>>>> is in
>>>>>> java.util.concurrent
>>>>>> and all we need to do is to define "tasks" that can be "submitted
to
>>>>>> an executor:
>>>>>>
>>>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/AbstractExecutorService.html#submit(java.util.concurrent.Callable)
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Gilles
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message