commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [Math] Moving on or not?
Date Fri, 08 Feb 2013 08:04:51 GMT
Le 08/02/2013 03:21, Konstantin Berlin a écrit :
> Sorry, but not of this is making sense to me. We had a long discussion
> about how the library doesn't test for large scale problem
> performance. A lot of algorithms probably do not scale well as the
> result. There was talk of dropping sparse support in linear algebra.
> So instead of fixing that, you jump to parallelization, which is
> needed only for large scale problems, which this library does not
> handle well even in single thread right now.
> 
> The most significant impact you can have is fixing the linear algebra
> component.

I agree with this. Also in order to avoid spreading our attention too
much on keeping several branches in sync, I would suggest to not create
a new component but directly decide we will not support Java 5 anymore
as of Apache Commons Math 4.0, so people can progressively use the new
features of the language and experiment directly on the trunk.

best regards,
Luc

> 
> On Feb 7, 2013, at 5:06 PM, Gilles <gilles@harfang.homelinux.org> wrote:
> 
>> On Thu, 07 Feb 2013 08:32:46 -0800, Phil Steitz wrote:
>>> On 2/7/13 8:04 AM, Gilles wrote:
>>>> On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote:
>>>>> On 2/7/13 4:58 AM, Gilles wrote:
>>>>>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote:
>>>>>>> On 2/6/13 9:03 AM, Gilles wrote:
>>>>>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>>>>>>>>> On 2/5/13 6:08 AM, Gilles wrote:
>>>>>>>>>> Hi.
>>>>>>>>>>
>>>>>>>>>> In the thread about "static import", Stephen noted
that
>>>>>>>>>> decisions
>>>>>>>>>> on a
>>>>>>>>>> component's evolution are dependent on whether the
future of
>>>>>>>>>> the
>>>>>>>>>> Java
>>>>>>>>>> language is taken into account, or not.
>>>>>>>>>> A question on the same theme also arose after the
>>>>>>>>>> presentation of
>>>>>>>>>> Commons
>>>>>>>>>> Math in FOSDEM 2013.
>>>>>>>>>>
>>>>>>>>>> If we assume that efficiency is among the important
>>>>>>>>>> qualities for
>>>>>>>>>> Commons
>>>>>>>>>> Math, the future is to allow usage of the tools provided
by the
>>>>>>>>>> standard
>>>>>>>>>> Java library in order to ease the development of
multi-threaded
>>>>>>>>>> algorithms.
>>>>>>>>>>
>>>>>>>>>> Maintaining Java 1.5 source compatibility for the
reason
>>>>>>>>>> that we
>>>>>>>>>> may need
>>>>>>>>>> to support legacy applications will turn out to be
>>>>>>>>>> self-defeating:
>>>>>>>>>> 1. New users will not consider Commons Math's features
that are
>>>>>>>>>> notably
>>>>>>>>>>   apt to parallel processing.
>>>>>>>>>> 2. Current users might at some point simply switch
to another
>>>>>>>>>> library if
>>>>>>>>>>   it proves more efficient (because it actually uses
>>>>>>>>>> multi-threading).
>>>>>>>>>> 3. New Java developers will be turned away because
they will
>>>>>>>>>> want
>>>>>>>>>> to use
>>>>>>>>>>   the more convenient features of the language in
order to
>>>>>>>>>> provide
>>>>>>>>>>   potential contributions.
>>>>>>>>>>
>>>>>>>>>> If maintaining 1.5 source compatibility is kept as
a
>>>>>>>>>> requirement, the
>>>>>>>>>> consequence is that Commons Math will _become_ a
legacy
>>>>>>>>>> library.
>>>>>>>>>> In that perspective, implementing/improving algorithms
for
>>>>>>>>>> which a
>>>>>>>>>> parallel version is known to be more efficient is
plainly a
>>>>>>>>>> waste of
>>>>>>>>>> development and maintenance time.
>>>>>>>>>>
>>>>>>>>>> In order to mitigate the risks (both of upgrading
and of not
>>>>>>>>>> upgrading
>>>>>>>>>> the source compatibility requirement), I would propose
to
>>>>>>>>>> create a
>>>>>>>>>> new
>>>>>>>>>> project (say, "Commons Math MT") where we could implement
new
>>>>>>>>>> features[1]
>>>>>>>>>> without being encumbered with the 1.5 requirement.[2]
>>>>>>>>>> The "Commons Math MT" would depend on "Commons Math"
where we
>>>>>>>>>> would
>>>>>>>>>> continue developing single-thread (and thread-safe)
"tasks",
>>>>>>>>>> i.e.
>>>>>>>>>> independent units of processing that could be used
in
>>>>>>>>>> algorithms
>>>>>>>>>> located in "Commons Math MT".
>>>>>>>>>>
>>>>>>>>>> In summary:
>>>>>>>>>> - Commons Math (as usual):
>>>>>>>>>>  * single-thread (sequential) algorithms,
>>>>>>>>>>  * (pure) Java 5,
>>>>>>>>>>  * no dependencies.
>>>>>>>>>> - Commons Math MT:
>>>>>>>>>>  * multi-thread (parallel) algorithms,
>>>>>>>>>>  * Java 7 and beyond,
>>>>>>>>>>  * JNI allowed,
>>>>>>>>>>  * dependencies allowed (jCuda).
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> There are several other possibilities to consider:
>>>>>>>>>
>>>>>>>>> 0) Implement multithreading using JDK 1.5 primitives
>>>>>>>>> 1) Set things up within [math] to support parallel execution
in
>>>>>>>>> JDK
>>>>>>>>> 1.7, Hadoop or other frameworks
>>>>>>>>> 2) Instead of a new project, start a 4.x branch targeting
JDK
>>>>>>>>> 1.7
>>>>>>>>>
>>>>>>>>> I think we should maintain a version that has no dependencies
>>>>>>>>> and no
>>>>>>>>> JNI in any case.
>>>>>>>>>
>>>>>>>>> Starting a branch and getting concrete about how to parallelize
>>>>>>>>> some
>>>>>>>>> algorithms would be a good way to start.  One thing I
have not
>>>>>>>>> really investigated and would be interested in details
on is
>>>>>>>>> what
>>>>>>>>> you actually get in efficiency gain (or loss?) using
fork /
>>>>>>>>> join vs
>>>>>>>>> just using 1.5+ concurrency for the kinds of problems
we
>>>>>>>>> would end
>>>>>>>>> up using this stuff for.
>>>>>>>>>
>>>>>>>>> Thinking about specific parallelization problem instances
would
>>>>>>>>> also
>>>>>>>>> help decide whether 1) makes sense (i.e., whether it
makes
>>>>>>>>> sense as
>>>>>>>>> you mention above to maintain a single-threaded library
that
>>>>>>>>> provides task execution for a multithreaded version or
>>>>>>>>> multithreaded
>>>>>>>>> frameworks).
>>>>>>>>>
>>>>>>>>> One more thing to consider is that for at least some
users of
>>>>>>>>> [math], having the library internally spawn threads and/or
peg
>>>>>>>>> multiple processors may not be desirable.  It is a little
>>>>>>>>> misleading
>>>>>>>>> to say that multithreading is the way to get "efficiency."
>>>>>>>>> It is
>>>>>>>>> really the way to *use* more compute resources and unless
there
>>>>>>>>> are
>>>>>>>>> real algorithmic improvements, the overall efficiency
may
>>>>>>>>> actually
>>>>>>>>> be less, due to task coordination overhead.  What you
get is
>>>>>>>>> faster
>>>>>>>>> execution due to more greedy utilization of available
cores.
>>>>>>>>> Actual
>>>>>>>>> efficiency (how much overall compute resource it takes
to
>>>>>>>>> complete a
>>>>>>>>> job) partly depends on how efficiently the coordination
>>>>>>>>> itself is
>>>>>>>>> done (which JDK 1.7 claims to do very well - I have just
not
>>>>>>>>> seen
>>>>>>>>> substantiation or any benchmarks demonstrating this)
and how the
>>>>>>>>> parallelization effects overall compute requirements.
 In any
>>>>>>>>> case,
>>>>>>>>> for environments where library thread-spawning is not
>>>>>>>>> desirable, I
>>>>>>>>> think we should maintain a single-threaded version.
>>>>>>>>
>>>>>>>> Unless I missed the point, those reasons are exactly why
I
>>>>>>>> propose to
>>>>>>>> have 2 projects/components. One, "Commons-Math", does not
fiddle
>>>>>>>> with
>>>>>>>> resources, while the other would provide a "parallelizationLevel"
>>>>>>>> setting for the algorithms written to possibly take advantage
of
>>>>>>>> the
>>>>>>>> Java 5+ "task framework".
>>>>>>>
>>>>>>> OK, what about the 4.x option?
>>>>>>>>
>>>>>>>> Yes, we could still be good by using only Java 5's concurrency
>>>>>>>> features
>>>>>>>> but the issue I raise is not only about concurrency but about
>>>>>>>> evolution/progress/maintenance, all things that require raising
>>>>>>>> interest
>>>>>>>> from new contributors (unless it's fine that Commons Math
be
>>>>>>>> tagged as a
>>>>>>>> "library of the past"...).
>>>>>>>
>>>>>>> +1 for experimenting with parallelization.  I would just like
to
>>>>>>> understand if the JDK 7 stuff really adds much - in particular,
>>>>>>> does
>>>>>>> it handle coordination / cpu allocation better than you could
>>>>>>> easily
>>>>>>> do it with 1.5.  More supported JDKs == more potential users,
so I
>>>>>>> like to see a real reason to bump the JDK level.
>>>>>>>>
>>>>>>>> But using concurrency features in "Commons Math" would also
>>>>>>>> contradict
>>>>>>>> your own point ("we should maintain a single-threaded
>>>>>>>> version"): I
>>>>>>>> agree,
>>>>>>>> and that's why I proposed this other project...
>>>>>>>>
>>>>>>>> As for efficiency (or faster execution, if you want), I don't
>>>>>>>> see the
>>>>>>>> point in doubting that tasks like global search (e.g. in
a
>>>>>>>> genetic
>>>>>>>> algorithm) will complete in less time when run in parallel...
>>>>>>>>
>>>>>>>> As I summarized previously, having a "Commons Math MT" would
>>>>>>>> bring no
>>>>>>>> inconvenience, contrary to either your points 0, 1, or 2.
[No
>>>>>>>> inconvenience to me, that is, but to people with requirements
>>>>>>>> like
>>>>>>>> "Java 5 compatible" or "no multi-threading").
>>>>>>>> As I indicated, the basic "task" could be defined in "Commons
>>>>>>>> Math" and
>>>>>>>> "Commons Math MT" would provide the parallelization "glue"
(e.g.
>>>>>>>> to divide
>>>>>>>> the search space of the GA).
>>>>>>>
>>>>>>> I think it is best at this point to cut a branch and actually
>>>>>>> start
>>>>>>> working on specific algorithms.  Having a set of candidate
>>>>>>> algorithms for parallelization will help us decide what we
>>>>>>> actually
>>>>>>> need and how it might work.  I would personally favor the 4.x
>>>>>>> approach, with thread-spawning behavior configurable.
>>>>>>
>>>>>> It seems fair to wait until parallel algorithms are actually
>>>>>> implemented.
>>>>>>
>>>>>> However it is not clear what you mean with "the 4.x approach": if
>>>>>> it is
>>>>>> actually allowing Java 7, that would mean that, starting from 4.0,
>>>>>> we'll
>>>>>> indeed drop support of earlier JVMs!
>>>>>> Why would this be preferred to having 2 projects? Of course, if
>>>>>> everyone
>>>>>> agrees to that move to Java 7, that's fine. :-)
>>>>>
>>>>> What I meant was that instead of creating a new component, we would
>>>>> just create a new release line.  Like what tomcat does for servlet
>>>>> spec versions.  I guess this does mean that we end up having to
>>>>> stabilize the 3.x APIs because no additional "major" release would
>>>>> be allowed in that line.  That would be a *good thing* IMO as long
>>>>> as we can do it cleanly.  If not, maybe we end up having to use 5.x
>>>>> for the JDK 1.7+ version, using 4.0 to get to a stable API for the
>>>>> current trunk code.
>>>>
>>>> There's a still the human resource problem: we don't have it to
>>>> maintain
>>>> a single branch; having two will only make it worse.
>>>
>>> Yes, but the "new project" approach has the same problem.
>>
>> Yes.
>> However, I meant it as a way to separate concerns, as shown
>> by diverging opinions, even in the few people who take part
>> in this discussion or in previous ones about the same subject.
>>
>> A sibling (not separate!) project could allow interested
>> people to experiment while not adding yet another "distraction"
>> to the main project, where people more focused on the
>> mathematical (for lack of a better word) side can continue
>> their own improvements.
>> A healthy interaction could even come out of having a "public"
>> use-case in the form of a project that needs certain facilities
>> (algorithms as tasks) in order to provide multi-thread
>> utilities to users (who might prefer not to have to implement
>> them themselves at a higher level).
>>
>>>>>> On the other hand, if we keep Java 5, at least until we get use
>>>>>> cases or
>>>>>> contributions that would benefit from features in JDKs newer than
>>>>>> 1.5,
>>>>>> there is no need to create a branch; we can just go on with adding
>>>>>> multi-thread codes to the trunk (to become part[1] of the upcoming
>>>>>> 3.x
>>>>>> releases).
>>>>>
>>>>> That is why I wanted to get a feel for what the JDK 1.7 stuff really
>>>>> buys you.   Has anyone seen benchmarks showing better performance
>>>>> using 1.7 than can be obtained just using 1.5 concurrency
>>>>> primitives?
>>>>
>>>> Again, there are separate issues:
>>>> 1. Coding in Java 7
>>>> 2. Running with the JVM shipped with JDK 1.7
>>>>
>>>> The newer JVMs are faster, independently of whether new features
>>>> of the
>>>> language are used.
>>>> But it could well be that some of the new features allow even better
>>>> performance (as is foreseen for Java 8).
>>>
>>> Agreed.  I am interested in understanding better both how much
>>> easier it actually is to code and whether the 1.7 framework
>>> materially improves scheduling / allocation over what you could do
>>> just using 1.5 primitives.
>>
>> I cannot provide proof, but nor is anyone on this list
>> eager to prove the contrary; hence the proposal to set
>> up a "playground".
>>
>>>>> Has anyone used 1.7 to parallelize numerical algorithms
>>>>> and found it really easier / more performant?
>>>>
>>>> Where are those people who could answer?
>>>
>>> This is a public list :)
>>>> That is one of the points I raised. If we maintain source
>>>> compatibility
>>>> with a language version that is 9 years old, not many contributors
>>>> are
>>>> going to be interested. Thus reducing the chance to get answers...
>>>>
>>>>> Any opinions /
>>>>> responses to Konstantin's comment on where parallelization should be
>>>>> implemented - i.e. in the library vs somewhere up the stack?
>>>>
>>>> What was the _question_?  ...
>>>
>>> The question he implicitly raised was whether or not it makes sense
>>> for a low-level library to parallelize tasks / run across cores.
>>
>> In several areas, CM is not a low-level library (GA, multi-start
>> optimizers for example). In other areas like FFT, a user can
>> legitimately expect top performance without having to handle
>> parallelization by himself.
>>
>>> This is a legitimate question.  It may be better actually to set
>>> things up so that higher-level frameworks or applications can
>>> arrange parallel execution rather than embedding it in the low-level
>>> library itself.  This is also what I was referring to when I said
>>> that in some contexts, thread-spawning / cpu hogging may not be
>>> desirable.
>>
>> For several cases (GA, FFT, multi-start optimizers), I have the
>> opposite viewpoint: multi-threading is a implementation detail,
>> that could be handled at a _lower_ level. Of course, the user can
>> decide whether to enable more than one thread.
>>
>>>>> Any
>>>>> ideas how to set things up so that [math] code can play nicely with
>>>>> concurrency frameworks?
>>>>
>>>> That's a strange question in the context of a project that tries hard
>>>> not to have any dependency.
>>>
>>> I did not mean necessarily to bring in dependencies; but rather to
>>> make it easy for computational tasks executed by [math] code to be
>>> managed by external concurrency frameworks, e.g. Hadoop.
>>
>> In the context of Commons Math, we often heard that "no dependency"
>> is good. Then, it is also good to not impose _implicit_ dependencies
>> (like: "If you use Hadoop, you could have better performance"). In a
>> way, the CM development "model" is: "We provide a toolkit of efficient
>> procedures, and you, the user, get top performance (on a best effort
>> basis of course)."
>> If we can provide better performance through multi-threading, why not?
>> Nobody will be forced to use it: they will use the "basic" (sequential)
>> tasks, or set the "parallelizationLevel" setting to 1.
>>
>> Gilles
>>
>>> Phil
>>>> If the requirement is to only depend on the standard JDK: the
>>>> framework
>>>> is in
>>>> java.util.concurrent
>>>> and all we need to do is to define "tasks" that can be "submitted to
>>>> an executor:
>>>>
>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/AbstractExecutorService.html#submit(java.util.concurrent.Callable)
>>>>
>>>>
>>>> Regards,
>>>> Gilles
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message