commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [Math] Moving on or not?
Date Wed, 06 Feb 2013 17:13:18 GMT
On Wed, Feb 6, 2013 at 12:03 PM, Gilles <gilles@harfang.homelinux.org>wrote:

> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>
>> On 2/5/13 6:08 AM, Gilles wrote:
>>
>>> Hi.
>>>
>>> In the thread about "static import", Stephen noted that decisions
>>> on a
>>> component's evolution are dependent on whether the future of the Java
>>> language is taken into account, or not.
>>> A question on the same theme also arose after the presentation of
>>> Commons
>>> Math in FOSDEM 2013.
>>>
>>> If we assume that efficiency is among the important qualities for
>>> Commons
>>> Math, the future is to allow usage of the tools provided by the
>>> standard
>>> Java library in order to ease the development of multi-threaded
>>> algorithms.
>>>
>>> Maintaining Java 1.5 source compatibility for the reason that we
>>> may need
>>> to support legacy applications will turn out to be self-defeating:
>>> 1. New users will not consider Commons Math's features that are
>>> notably
>>>    apt to parallel processing.
>>> 2. Current users might at some point simply switch to another
>>> library if
>>>    it proves more efficient (because it actually uses
>>> multi-threading).
>>> 3. New Java developers will be turned away because they will want
>>> to use
>>>    the more convenient features of the language in order to provide
>>>    potential contributions.
>>>
>>> If maintaining 1.5 source compatibility is kept as a requirement, the
>>> consequence is that Commons Math will _become_ a legacy library.
>>> In that perspective, implementing/improving algorithms for which a
>>> parallel version is known to be more efficient is plainly a waste of
>>> development and maintenance time.
>>>
>>> In order to mitigate the risks (both of upgrading and of not
>>> upgrading
>>> the source compatibility requirement), I would propose to create a
>>> new
>>> project (say, "Commons Math MT") where we could implement new
>>> features[1]
>>> without being encumbered with the 1.5 requirement.[2]
>>> The "Commons Math MT" would depend on "Commons Math" where we would
>>> continue developing single-thread (and thread-safe) "tasks", i.e.
>>> independent units of processing that could be used in algorithms
>>> located in "Commons Math MT".
>>>
>>> In summary:
>>> - Commons Math (as usual):
>>>   * single-thread (sequential) algorithms,
>>>   * (pure) Java 5,
>>>   * no dependencies.
>>> - Commons Math MT:
>>>   * multi-thread (parallel) algorithms,
>>>   * Java 7 and beyond,
>>>   * JNI allowed,
>>>   * dependencies allowed (jCuda).
>>>
>>> What do you think?
>>>
>>
>> There are several other possibilities to consider:
>>
>> 0) Implement multithreading using JDK 1.5 primitives
>> 1) Set things up within [math] to support parallel execution in JDK
>> 1.7, Hadoop or other frameworks
>> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7
>>
>> I think we should maintain a version that has no dependencies and no
>> JNI in any case.
>>
>> Starting a branch and getting concrete about how to parallelize some
>> algorithms would be a good way to start.  One thing I have not
>> really investigated and would be interested in details on is what
>> you actually get in efficiency gain (or loss?) using fork / join vs
>> just using 1.5+ concurrency for the kinds of problems we would end
>> up using this stuff for.
>>
>> Thinking about specific parallelization problem instances would also
>> help decide whether 1) makes sense (i.e., whether it makes sense as
>> you mention above to maintain a single-threaded library that
>> provides task execution for a multithreaded version or multithreaded
>> frameworks).
>>
>> One more thing to consider is that for at least some users of
>> [math], having the library internally spawn threads and/or peg
>> multiple processors may not be desirable.  It is a little misleading
>> to say that multithreading is the way to get "efficiency."  It is
>> really the way to *use* more compute resources and unless there are
>> real algorithmic improvements, the overall efficiency may  actually
>> be less, due to task coordination overhead.  What you get is faster
>> execution due to more greedy utilization of available cores.  Actual
>> efficiency (how much overall compute resource it takes to complete a
>> job) partly depends on how efficiently the coordination itself is
>> done (which JDK 1.7 claims to do very well - I have just not seen
>> substantiation or any benchmarks demonstrating this) and how the
>> parallelization effects overall compute requirements.  In any case,
>> for environments where library thread-spawning is not desirable, I
>> think we should maintain a single-threaded version.
>>
>>
> Unless I missed the point, those reasons are exactly why I propose to
> have 2 projects/components. One, "Commons-Math", does not fiddle with
> resources, while the other would provide a "parallelizationLevel"
> setting for the algorithms written to possibly take advantage of the
> Java 5+ "task framework".
>
> Yes, we could still be good by using only Java 5's concurrency features
> but the issue I raise is not only about concurrency but about
> evolution/progress/**maintenance, all things that require raising interest
> from new contributors (unless it's fine that Commons Math be tagged as a
> "library of the past"...).
>
> But using concurrency features in "Commons Math" would also contradict
> your own point ("we should maintain a single-threaded version"): I agree,
> and that's why I proposed this other project...
>
> As for efficiency (or faster execution, if you want), I don't see the
> point in doubting that tasks like global search (e.g. in a genetic
> algorithm) will complete in less time when run in parallel...
>
> As I summarized previously, having a "Commons Math MT" would bring no
> inconvenience, contrary to either your points 0, 1, or 2. [No
> inconvenience to me, that is, but to people with requirements like
> "Java 5 compatible" or "no multi-threading").
> As I indicated, the basic "task" could be defined in "Commons Math" and
> "Commons Math MT" would provide the parallelization "glue" (e.g. to divide
> the search space of the GA).
>

What about having the MT pieces in a .mt package (or .mt subpackages) ?

WRT "divide the search space of the GA", I would think that having it
lumped in with the main project would help have more releases more often.

In [commons] in general it seems painful than to make releases, so many
steps and bits, so more projects, more pain.

Gary

>
>
> Gilles
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<dev-unsubscribe@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message