commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [Math] Moving on or not?
Date Wed, 06 Feb 2013 17:46:55 GMT
On 2/6/13 9:03 AM, Gilles wrote:
> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>> On 2/5/13 6:08 AM, Gilles wrote:
>>> Hi.
>>>
>>> In the thread about "static import", Stephen noted that decisions
>>> on a
>>> component's evolution are dependent on whether the future of the
>>> Java
>>> language is taken into account, or not.
>>> A question on the same theme also arose after the presentation of
>>> Commons
>>> Math in FOSDEM 2013.
>>>
>>> If we assume that efficiency is among the important qualities for
>>> Commons
>>> Math, the future is to allow usage of the tools provided by the
>>> standard
>>> Java library in order to ease the development of multi-threaded
>>> algorithms.
>>>
>>> Maintaining Java 1.5 source compatibility for the reason that we
>>> may need
>>> to support legacy applications will turn out to be self-defeating:
>>> 1. New users will not consider Commons Math's features that are
>>> notably
>>>    apt to parallel processing.
>>> 2. Current users might at some point simply switch to another
>>> library if
>>>    it proves more efficient (because it actually uses
>>> multi-threading).
>>> 3. New Java developers will be turned away because they will want
>>> to use
>>>    the more convenient features of the language in order to provide
>>>    potential contributions.
>>>
>>> If maintaining 1.5 source compatibility is kept as a
>>> requirement, the
>>> consequence is that Commons Math will _become_ a legacy library.
>>> In that perspective, implementing/improving algorithms for which a
>>> parallel version is known to be more efficient is plainly a
>>> waste of
>>> development and maintenance time.
>>>
>>> In order to mitigate the risks (both of upgrading and of not
>>> upgrading
>>> the source compatibility requirement), I would propose to create a
>>> new
>>> project (say, "Commons Math MT") where we could implement new
>>> features[1]
>>> without being encumbered with the 1.5 requirement.[2]
>>> The "Commons Math MT" would depend on "Commons Math" where we would
>>> continue developing single-thread (and thread-safe) "tasks", i.e.
>>> independent units of processing that could be used in algorithms
>>> located in "Commons Math MT".
>>>
>>> In summary:
>>> - Commons Math (as usual):
>>>   * single-thread (sequential) algorithms,
>>>   * (pure) Java 5,
>>>   * no dependencies.
>>> - Commons Math MT:
>>>   * multi-thread (parallel) algorithms,
>>>   * Java 7 and beyond,
>>>   * JNI allowed,
>>>   * dependencies allowed (jCuda).
>>>
>>> What do you think?
>>
>> There are several other possibilities to consider:
>>
>> 0) Implement multithreading using JDK 1.5 primitives
>> 1) Set things up within [math] to support parallel execution in JDK
>> 1.7, Hadoop or other frameworks
>> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7
>>
>> I think we should maintain a version that has no dependencies and no
>> JNI in any case.
>>
>> Starting a branch and getting concrete about how to parallelize some
>> algorithms would be a good way to start.  One thing I have not
>> really investigated and would be interested in details on is what
>> you actually get in efficiency gain (or loss?) using fork / join vs
>> just using 1.5+ concurrency for the kinds of problems we would end
>> up using this stuff for.
>>
>> Thinking about specific parallelization problem instances would also
>> help decide whether 1) makes sense (i.e., whether it makes sense as
>> you mention above to maintain a single-threaded library that
>> provides task execution for a multithreaded version or multithreaded
>> frameworks).
>>
>> One more thing to consider is that for at least some users of
>> [math], having the library internally spawn threads and/or peg
>> multiple processors may not be desirable.  It is a little misleading
>> to say that multithreading is the way to get "efficiency."  It is
>> really the way to *use* more compute resources and unless there are
>> real algorithmic improvements, the overall efficiency may  actually
>> be less, due to task coordination overhead.  What you get is faster
>> execution due to more greedy utilization of available cores.  Actual
>> efficiency (how much overall compute resource it takes to complete a
>> job) partly depends on how efficiently the coordination itself is
>> done (which JDK 1.7 claims to do very well - I have just not seen
>> substantiation or any benchmarks demonstrating this) and how the
>> parallelization effects overall compute requirements.  In any case,
>> for environments where library thread-spawning is not desirable, I
>> think we should maintain a single-threaded version.
>>
>
> Unless I missed the point, those reasons are exactly why I propose to
> have 2 projects/components. One, "Commons-Math", does not fiddle with
> resources, while the other would provide a "parallelizationLevel"
> setting for the algorithms written to possibly take advantage of the
> Java 5+ "task framework".

OK, what about the 4.x option?
>
> Yes, we could still be good by using only Java 5's concurrency
> features
> but the issue I raise is not only about concurrency but about
> evolution/progress/maintenance, all things that require raising
> interest
> from new contributors (unless it's fine that Commons Math be
> tagged as a
> "library of the past"...).

+1 for experimenting with parallelization.  I would just like to
understand if the JDK 7 stuff really adds much - in particular, does
it handle coordination / cpu allocation better than you could easily
do it with 1.5.  More supported JDKs == more potential users, so I
like to see a real reason to bump the JDK level.
>
> But using concurrency features in "Commons Math" would also
> contradict
> your own point ("we should maintain a single-threaded version"): I
> agree,
> and that's why I proposed this other project...
>
> As for efficiency (or faster execution, if you want), I don't see the
> point in doubting that tasks like global search (e.g. in a genetic
> algorithm) will complete in less time when run in parallel...
>
> As I summarized previously, having a "Commons Math MT" would bring no
> inconvenience, contrary to either your points 0, 1, or 2. [No
> inconvenience to me, that is, but to people with requirements like
> "Java 5 compatible" or "no multi-threading").
> As I indicated, the basic "task" could be defined in "Commons
> Math" and
> "Commons Math MT" would provide the parallelization "glue" (e.g.
> to divide
> the search space of the GA).

I think it is best at this point to cut a branch and actually start
working on specific algorithms.  Having a set of candidate
algorithms for parallelization will help us decide what we actually
need and how it might work.  I would personally favor the 4.x
approach, with thread-spawning behavior configurable.

Phil
>
>
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message