commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] Moving on or not?
Date Thu, 07 Feb 2013 16:04:56 GMT
On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote:
> On 2/7/13 4:58 AM, Gilles wrote:
>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote:
>>> On 2/6/13 9:03 AM, Gilles wrote:
>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>>>>> On 2/5/13 6:08 AM, Gilles wrote:
>>>>>> Hi.
>>>>>>
>>>>>> In the thread about "static import", Stephen noted that 
>>>>>> decisions
>>>>>> on a
>>>>>> component's evolution are dependent on whether the future of the
>>>>>> Java
>>>>>> language is taken into account, or not.
>>>>>> A question on the same theme also arose after the presentation 
>>>>>> of
>>>>>> Commons
>>>>>> Math in FOSDEM 2013.
>>>>>>
>>>>>> If we assume that efficiency is among the important qualities 
>>>>>> for
>>>>>> Commons
>>>>>> Math, the future is to allow usage of the tools provided by the
>>>>>> standard
>>>>>> Java library in order to ease the development of multi-threaded
>>>>>> algorithms.
>>>>>>
>>>>>> Maintaining Java 1.5 source compatibility for the reason that we
>>>>>> may need
>>>>>> to support legacy applications will turn out to be
>>>>>> self-defeating:
>>>>>> 1. New users will not consider Commons Math's features that are
>>>>>> notably
>>>>>>    apt to parallel processing.
>>>>>> 2. Current users might at some point simply switch to another
>>>>>> library if
>>>>>>    it proves more efficient (because it actually uses
>>>>>> multi-threading).
>>>>>> 3. New Java developers will be turned away because they will 
>>>>>> want
>>>>>> to use
>>>>>>    the more convenient features of the language in order to
>>>>>> provide
>>>>>>    potential contributions.
>>>>>>
>>>>>> If maintaining 1.5 source compatibility is kept as a
>>>>>> requirement, the
>>>>>> consequence is that Commons Math will _become_ a legacy library.
>>>>>> In that perspective, implementing/improving algorithms for
>>>>>> which a
>>>>>> parallel version is known to be more efficient is plainly a
>>>>>> waste of
>>>>>> development and maintenance time.
>>>>>>
>>>>>> In order to mitigate the risks (both of upgrading and of not
>>>>>> upgrading
>>>>>> the source compatibility requirement), I would propose to
>>>>>> create a
>>>>>> new
>>>>>> project (say, "Commons Math MT") where we could implement new
>>>>>> features[1]
>>>>>> without being encumbered with the 1.5 requirement.[2]
>>>>>> The "Commons Math MT" would depend on "Commons Math" where we
>>>>>> would
>>>>>> continue developing single-thread (and thread-safe) "tasks", 
>>>>>> i.e.
>>>>>> independent units of processing that could be used in algorithms
>>>>>> located in "Commons Math MT".
>>>>>>
>>>>>> In summary:
>>>>>> - Commons Math (as usual):
>>>>>>   * single-thread (sequential) algorithms,
>>>>>>   * (pure) Java 5,
>>>>>>   * no dependencies.
>>>>>> - Commons Math MT:
>>>>>>   * multi-thread (parallel) algorithms,
>>>>>>   * Java 7 and beyond,
>>>>>>   * JNI allowed,
>>>>>>   * dependencies allowed (jCuda).
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> There are several other possibilities to consider:
>>>>>
>>>>> 0) Implement multithreading using JDK 1.5 primitives
>>>>> 1) Set things up within [math] to support parallel execution in
>>>>> JDK
>>>>> 1.7, Hadoop or other frameworks
>>>>> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7
>>>>>
>>>>> I think we should maintain a version that has no dependencies
>>>>> and no
>>>>> JNI in any case.
>>>>>
>>>>> Starting a branch and getting concrete about how to parallelize
>>>>> some
>>>>> algorithms would be a good way to start.  One thing I have not
>>>>> really investigated and would be interested in details on is what
>>>>> you actually get in efficiency gain (or loss?) using fork /
>>>>> join vs
>>>>> just using 1.5+ concurrency for the kinds of problems we would 
>>>>> end
>>>>> up using this stuff for.
>>>>>
>>>>> Thinking about specific parallelization problem instances would
>>>>> also
>>>>> help decide whether 1) makes sense (i.e., whether it makes
>>>>> sense as
>>>>> you mention above to maintain a single-threaded library that
>>>>> provides task execution for a multithreaded version or
>>>>> multithreaded
>>>>> frameworks).
>>>>>
>>>>> One more thing to consider is that for at least some users of
>>>>> [math], having the library internally spawn threads and/or peg
>>>>> multiple processors may not be desirable.  It is a little
>>>>> misleading
>>>>> to say that multithreading is the way to get "efficiency."  It is
>>>>> really the way to *use* more compute resources and unless there
>>>>> are
>>>>> real algorithmic improvements, the overall efficiency may
>>>>> actually
>>>>> be less, due to task coordination overhead.  What you get is
>>>>> faster
>>>>> execution due to more greedy utilization of available cores.
>>>>> Actual
>>>>> efficiency (how much overall compute resource it takes to
>>>>> complete a
>>>>> job) partly depends on how efficiently the coordination itself is
>>>>> done (which JDK 1.7 claims to do very well - I have just not seen
>>>>> substantiation or any benchmarks demonstrating this) and how the
>>>>> parallelization effects overall compute requirements.  In any
>>>>> case,
>>>>> for environments where library thread-spawning is not desirable, 
>>>>> I
>>>>> think we should maintain a single-threaded version.
>>>>>
>>>>
>>>> Unless I missed the point, those reasons are exactly why I
>>>> propose to
>>>> have 2 projects/components. One, "Commons-Math", does not fiddle
>>>> with
>>>> resources, while the other would provide a "parallelizationLevel"
>>>> setting for the algorithms written to possibly take advantage of
>>>> the
>>>> Java 5+ "task framework".
>>>
>>> OK, what about the 4.x option?
>>>>
>>>> Yes, we could still be good by using only Java 5's concurrency
>>>> features
>>>> but the issue I raise is not only about concurrency but about
>>>> evolution/progress/maintenance, all things that require raising
>>>> interest
>>>> from new contributors (unless it's fine that Commons Math be
>>>> tagged as a
>>>> "library of the past"...).
>>>
>>> +1 for experimenting with parallelization.  I would just like to
>>> understand if the JDK 7 stuff really adds much - in particular, 
>>> does
>>> it handle coordination / cpu allocation better than you could 
>>> easily
>>> do it with 1.5.  More supported JDKs == more potential users, so I
>>> like to see a real reason to bump the JDK level.
>>>>
>>>> But using concurrency features in "Commons Math" would also
>>>> contradict
>>>> your own point ("we should maintain a single-threaded version"): I
>>>> agree,
>>>> and that's why I proposed this other project...
>>>>
>>>> As for efficiency (or faster execution, if you want), I don't
>>>> see the
>>>> point in doubting that tasks like global search (e.g. in a genetic
>>>> algorithm) will complete in less time when run in parallel...
>>>>
>>>> As I summarized previously, having a "Commons Math MT" would
>>>> bring no
>>>> inconvenience, contrary to either your points 0, 1, or 2. [No
>>>> inconvenience to me, that is, but to people with requirements like
>>>> "Java 5 compatible" or "no multi-threading").
>>>> As I indicated, the basic "task" could be defined in "Commons
>>>> Math" and
>>>> "Commons Math MT" would provide the parallelization "glue" (e.g.
>>>> to divide
>>>> the search space of the GA).
>>>
>>> I think it is best at this point to cut a branch and actually start
>>> working on specific algorithms.  Having a set of candidate
>>> algorithms for parallelization will help us decide what we actually
>>> need and how it might work.  I would personally favor the 4.x
>>> approach, with thread-spawning behavior configurable.
>>
>> It seems fair to wait until parallel algorithms are actually
>> implemented.
>>
>> However it is not clear what you mean with "the 4.x approach": if
>> it is
>> actually allowing Java 7, that would mean that, starting from 4.0,
>> we'll
>> indeed drop support of earlier JVMs!
>> Why would this be preferred to having 2 projects? Of course, if
>> everyone
>> agrees to that move to Java 7, that's fine. :-)
>
> What I meant was that instead of creating a new component, we would
> just create a new release line.  Like what tomcat does for servlet
> spec versions.  I guess this does mean that we end up having to
> stabilize the 3.x APIs because no additional "major" release would
> be allowed in that line.  That would be a *good thing* IMO as long
> as we can do it cleanly.  If not, maybe we end up having to use 5.x
> for the JDK 1.7+ version, using 4.0 to get to a stable API for the
> current trunk code.

There's a still the human resource problem: we don't have it to 
maintain
a single branch; having two will only make it worse.

>>
>> On the other hand, if we keep Java 5, at least until we get use
>> cases or
>> contributions that would benefit from features in JDKs newer than
>> 1.5,
>> there is no need to create a branch; we can just go on with adding
>> multi-thread codes to the trunk (to become part[1] of the upcoming
>> 3.x
>> releases).
>
> That is why I wanted to get a feel for what the JDK 1.7 stuff really
> buys you.   Has anyone seen benchmarks showing better performance
> using 1.7 than can be obtained just using 1.5 concurrency
> primitives?

Again, there are separate issues:
  1. Coding in Java 7
  2. Running with the JVM shipped with JDK 1.7

The newer JVMs are faster, independently of whether new features of the
language are used.
But it could well be that some of the new features allow even better
performance (as is foreseen for Java 8).

> Has anyone used 1.7 to parallelize numerical algorithms
> and found it really easier / more performant?

Where are those people who could answer?
That is one of the points I raised. If we maintain source compatibility
with a language version that is 9 years old, not many contributors are
going to be interested. Thus reducing the chance to get answers...

> Any opinions /
> responses to Konstantin's comment on where parallelization should be
> implemented - i.e. in the library vs somewhere up the stack?

What was the _question_?  ...

>  Any
> ideas how to set things up so that [math] code can play nicely with
> concurrency frameworks?

That's a strange question in the context of a project that tries hard
not to have any dependency.
If the requirement is to only depend on the standard JDK: the framework
is in
  java.util.concurrent
and all we need to do is to define "tasks" that can be "submitted to
an executor:
  
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/AbstractExecutorService.html#submit(java.util.concurrent.Callable)

Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message