commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Berlin <kber...@gmail.com>
Subject Re: [Math] Moving on or not?
Date Thu, 07 Feb 2013 17:45:31 GMT
On Feb 7, 2013, at 11:05 AM, Gilles <gilles@harfang.homelinux.org> wrote:

> On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote:
>> On 2/7/13 4:58 AM, Gilles wrote:
>>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote:
>>>> On 2/6/13 9:03 AM, Gilles wrote:
>>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>>>>>> On 2/5/13 6:08 AM, Gilles wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> In the thread about "static import", Stephen noted that decisions
>>>>>>> on a
>>>>>>> component's evolution are dependent on whether the future of
the
>>>>>>> Java
>>>>>>> language is taken into account, or not.
>>>>>>> A question on the same theme also arose after the presentation
of
>>>>>>> Commons
>>>>>>> Math in FOSDEM 2013.
>>>>>>>
>>>>>>> If we assume that efficiency is among the important qualities
for
>>>>>>> Commons
>>>>>>> Math, the future is to allow usage of the tools provided by the
>>>>>>> standard
>>>>>>> Java library in order to ease the development of multi-threaded
>>>>>>> algorithms.
>>>>>>>
>>>>>>> Maintaining Java 1.5 source compatibility for the reason that
we
>>>>>>> may need
>>>>>>> to support legacy applications will turn out to be
>>>>>>> self-defeating:
>>>>>>> 1. New users will not consider Commons Math's features that are
>>>>>>> notably
>>>>>>>   apt to parallel processing.
>>>>>>> 2. Current users might at some point simply switch to another
>>>>>>> library if
>>>>>>>   it proves more efficient (because it actually uses
>>>>>>> multi-threading).
>>>>>>> 3. New Java developers will be turned away because they will
want
>>>>>>> to use
>>>>>>>   the more convenient features of the language in order to
>>>>>>> provide
>>>>>>>   potential contributions.
>>>>>>>
>>>>>>> If maintaining 1.5 source compatibility is kept as a
>>>>>>> requirement, the
>>>>>>> consequence is that Commons Math will _become_ a legacy library.
>>>>>>> In that perspective, implementing/improving algorithms for
>>>>>>> which a
>>>>>>> parallel version is known to be more efficient is plainly a
>>>>>>> waste of
>>>>>>> development and maintenance time.
>>>>>>>
>>>>>>> In order to mitigate the risks (both of upgrading and of not
>>>>>>> upgrading
>>>>>>> the source compatibility requirement), I would propose to
>>>>>>> create a
>>>>>>> new
>>>>>>> project (say, "Commons Math MT") where we could implement new
>>>>>>> features[1]
>>>>>>> without being encumbered with the 1.5 requirement.[2]
>>>>>>> The "Commons Math MT" would depend on "Commons Math" where we
>>>>>>> would
>>>>>>> continue developing single-thread (and thread-safe) "tasks",
i.e.
>>>>>>> independent units of processing that could be used in algorithms
>>>>>>> located in "Commons Math MT".
>>>>>>>
>>>>>>> In summary:
>>>>>>> - Commons Math (as usual):
>>>>>>>  * single-thread (sequential) algorithms,
>>>>>>>  * (pure) Java 5,
>>>>>>>  * no dependencies.
>>>>>>> - Commons Math MT:
>>>>>>>  * multi-thread (parallel) algorithms,
>>>>>>>  * Java 7 and beyond,
>>>>>>>  * JNI allowed,
>>>>>>>  * dependencies allowed (jCuda).
>>>>>>>
>>>>>>> What do you think?
>>>>>>
>>>>>> There are several other possibilities to consider:
>>>>>>
>>>>>> 0) Implement multithreading using JDK 1.5 primitives
>>>>>> 1) Set things up within [math] to support parallel execution in
>>>>>> JDK
>>>>>> 1.7, Hadoop or other frameworks
>>>>>> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7
>>>>>>
>>>>>> I think we should maintain a version that has no dependencies
>>>>>> and no
>>>>>> JNI in any case.
>>>>>>
>>>>>> Starting a branch and getting concrete about how to parallelize
>>>>>> some
>>>>>> algorithms would be a good way to start.  One thing I have not
>>>>>> really investigated and would be interested in details on is what
>>>>>> you actually get in efficiency gain (or loss?) using fork /
>>>>>> join vs
>>>>>> just using 1.5+ concurrency for the kinds of problems we would end
>>>>>> up using this stuff for.
>>>>>>
>>>>>> Thinking about specific parallelization problem instances would
>>>>>> also
>>>>>> help decide whether 1) makes sense (i.e., whether it makes
>>>>>> sense as
>>>>>> you mention above to maintain a single-threaded library that
>>>>>> provides task execution for a multithreaded version or
>>>>>> multithreaded
>>>>>> frameworks).
>>>>>>
>>>>>> One more thing to consider is that for at least some users of
>>>>>> [math], having the library internally spawn threads and/or peg
>>>>>> multiple processors may not be desirable.  It is a little
>>>>>> misleading
>>>>>> to say that multithreading is the way to get "efficiency."  It is
>>>>>> really the way to *use* more compute resources and unless there
>>>>>> are
>>>>>> real algorithmic improvements, the overall efficiency may
>>>>>> actually
>>>>>> be less, due to task coordination overhead.  What you get is
>>>>>> faster
>>>>>> execution due to more greedy utilization of available cores.
>>>>>> Actual
>>>>>> efficiency (how much overall compute resource it takes to
>>>>>> complete a
>>>>>> job) partly depends on how efficiently the coordination itself is
>>>>>> done (which JDK 1.7 claims to do very well - I have just not seen
>>>>>> substantiation or any benchmarks demonstrating this) and how the
>>>>>> parallelization effects overall compute requirements.  In any
>>>>>> case,
>>>>>> for environments where library thread-spawning is not desirable,
I
>>>>>> think we should maintain a single-threaded version.
>>>>>
>>>>> Unless I missed the point, those reasons are exactly why I
>>>>> propose to
>>>>> have 2 projects/components. One, "Commons-Math", does not fiddle
>>>>> with
>>>>> resources, while the other would provide a "parallelizationLevel"
>>>>> setting for the algorithms written to possibly take advantage of
>>>>> the
>>>>> Java 5+ "task framework".
>>>>
>>>> OK, what about the 4.x option?
>>>>>
>>>>> Yes, we could still be good by using only Java 5's concurrency
>>>>> features
>>>>> but the issue I raise is not only about concurrency but about
>>>>> evolution/progress/maintenance, all things that require raising
>>>>> interest
>>>>> from new contributors (unless it's fine that Commons Math be
>>>>> tagged as a
>>>>> "library of the past"...).
>>>>
>>>> +1 for experimenting with parallelization.  I would just like to
>>>> understand if the JDK 7 stuff really adds much - in particular, does
>>>> it handle coordination / cpu allocation better than you could easily
>>>> do it with 1.5.  More supported JDKs == more potential users, so I
>>>> like to see a real reason to bump the JDK level.
>>>>>
>>>>> But using concurrency features in "Commons Math" would also
>>>>> contradict
>>>>> your own point ("we should maintain a single-threaded version"): I
>>>>> agree,
>>>>> and that's why I proposed this other project...
>>>>>
>>>>> As for efficiency (or faster execution, if you want), I don't
>>>>> see the
>>>>> point in doubting that tasks like global search (e.g. in a genetic
>>>>> algorithm) will complete in less time when run in parallel...
>>>>>
>>>>> As I summarized previously, having a "Commons Math MT" would
>>>>> bring no
>>>>> inconvenience, contrary to either your points 0, 1, or 2. [No
>>>>> inconvenience to me, that is, but to people with requirements like
>>>>> "Java 5 compatible" or "no multi-threading").
>>>>> As I indicated, the basic "task" could be defined in "Commons
>>>>> Math" and
>>>>> "Commons Math MT" would provide the parallelization "glue" (e.g.
>>>>> to divide
>>>>> the search space of the GA).
>>>>
>>>> I think it is best at this point to cut a branch and actually start
>>>> working on specific algorithms.  Having a set of candidate
>>>> algorithms for parallelization will help us decide what we actually
>>>> need and how it might work.  I would personally favor the 4.x
>>>> approach, with thread-spawning behavior configurable.
>>>
>>> It seems fair to wait until parallel algorithms are actually
>>> implemented.
>>>
>>> However it is not clear what you mean with "the 4.x approach": if
>>> it is
>>> actually allowing Java 7, that would mean that, starting from 4.0,
>>> we'll
>>> indeed drop support of earlier JVMs!
>>> Why would this be preferred to having 2 projects? Of course, if
>>> everyone
>>> agrees to that move to Java 7, that's fine. :-)
>>
>> What I meant was that instead of creating a new component, we would
>> just create a new release line.  Like what tomcat does for servlet
>> spec versions.  I guess this does mean that we end up having to
>> stabilize the 3.x APIs because no additional "major" release would
>> be allowed in that line.  That would be a *good thing* IMO as long
>> as we can do it cleanly.  If not, maybe we end up having to use 5.x
>> for the JDK 1.7+ version, using 4.0 to get to a stable API for the
>> current trunk code.
>
> There's a still the human resource problem: we don't have it to maintain
> a single branch; having two will only make it worse.
>

Exactly. So why do this when cleaning up linear algebra and
optimization seems like a much better use of time then trying to
parallelize algorithms?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message