Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 271BFE559 for ; Wed, 6 Feb 2013 17:47:28 +0000 (UTC) Received: (qmail 58270 invoked by uid 500); 6 Feb 2013 17:47:27 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 58197 invoked by uid 500); 6 Feb 2013 17:47:27 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 58188 invoked by uid 99); 6 Feb 2013 17:47:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 17:47:27 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of phil.steitz@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-ia0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 17:47:19 +0000 Received: by mail-ia0-f180.google.com with SMTP id f27so1861428iae.25 for ; Wed, 06 Feb 2013 09:46:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=hUKGylT/Nw4mOCvTdnZxk6hMy8CeCpjb57bmjIsbE4U=; b=TF12GuHSbdPIfUD3yvmdDwwEIp8jRlugpiykKLlcCjvXtMsammQ5UNhtxFUrnn7hBW BWhkH3DzzOCiOK7ci7ySn2bMqPaf8eL7vw7klB96d+7XlPQXCF4Xbm+RaJRGQiVc8aoR Yf/3zGjrWOAkZeZb1KTUn3DRTVzffZW72azWPFOHrCvQsN0PGOa+EKWDpybPjfLtHDBd wsmKz94I2tPFezK1mrfGuiG+v9MTE4VdSht2El5Z59yqkA9p1q8KWbNb4gs9/ghzIMG3 VfS59EzFP3ZU1Ne+GxSRy62V+wCYY5xFXN/AMu4uL1QAmBFwS6MXqi2mjqpC4v96pA9Q FYQQ== X-Received: by 10.50.222.226 with SMTP id qp2mr7965317igc.103.1360172819036; Wed, 06 Feb 2013 09:46:59 -0800 (PST) Received: from [10.4.70.54] ([107.0.25.2]) by mx.google.com with ESMTPS id c3sm4438662igj.1.2013.02.06.09.46.57 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Feb 2013 09:46:58 -0800 (PST) Message-ID: <5112970F.9030705@gmail.com> Date: Wed, 06 Feb 2013 09:46:55 -0800 From: Phil Steitz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Commons Developers List Subject: Re: [Math] Moving on or not? References: <51127493.1020106@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 2/6/13 9:03 AM, Gilles wrote: > On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote: >> On 2/5/13 6:08 AM, Gilles wrote: >>> Hi. >>> >>> In the thread about "static import", Stephen noted that decisions >>> on a >>> component's evolution are dependent on whether the future of the >>> Java >>> language is taken into account, or not. >>> A question on the same theme also arose after the presentation of >>> Commons >>> Math in FOSDEM 2013. >>> >>> If we assume that efficiency is among the important qualities for >>> Commons >>> Math, the future is to allow usage of the tools provided by the >>> standard >>> Java library in order to ease the development of multi-threaded >>> algorithms. >>> >>> Maintaining Java 1.5 source compatibility for the reason that we >>> may need >>> to support legacy applications will turn out to be self-defeating: >>> 1. New users will not consider Commons Math's features that are >>> notably >>> apt to parallel processing. >>> 2. Current users might at some point simply switch to another >>> library if >>> it proves more efficient (because it actually uses >>> multi-threading). >>> 3. New Java developers will be turned away because they will want >>> to use >>> the more convenient features of the language in order to provide >>> potential contributions. >>> >>> If maintaining 1.5 source compatibility is kept as a >>> requirement, the >>> consequence is that Commons Math will _become_ a legacy library. >>> In that perspective, implementing/improving algorithms for which a >>> parallel version is known to be more efficient is plainly a >>> waste of >>> development and maintenance time. >>> >>> In order to mitigate the risks (both of upgrading and of not >>> upgrading >>> the source compatibility requirement), I would propose to create a >>> new >>> project (say, "Commons Math MT") where we could implement new >>> features[1] >>> without being encumbered with the 1.5 requirement.[2] >>> The "Commons Math MT" would depend on "Commons Math" where we would >>> continue developing single-thread (and thread-safe) "tasks", i.e. >>> independent units of processing that could be used in algorithms >>> located in "Commons Math MT". >>> >>> In summary: >>> - Commons Math (as usual): >>> * single-thread (sequential) algorithms, >>> * (pure) Java 5, >>> * no dependencies. >>> - Commons Math MT: >>> * multi-thread (parallel) algorithms, >>> * Java 7 and beyond, >>> * JNI allowed, >>> * dependencies allowed (jCuda). >>> >>> What do you think? >> >> There are several other possibilities to consider: >> >> 0) Implement multithreading using JDK 1.5 primitives >> 1) Set things up within [math] to support parallel execution in JDK >> 1.7, Hadoop or other frameworks >> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7 >> >> I think we should maintain a version that has no dependencies and no >> JNI in any case. >> >> Starting a branch and getting concrete about how to parallelize some >> algorithms would be a good way to start. One thing I have not >> really investigated and would be interested in details on is what >> you actually get in efficiency gain (or loss?) using fork / join vs >> just using 1.5+ concurrency for the kinds of problems we would end >> up using this stuff for. >> >> Thinking about specific parallelization problem instances would also >> help decide whether 1) makes sense (i.e., whether it makes sense as >> you mention above to maintain a single-threaded library that >> provides task execution for a multithreaded version or multithreaded >> frameworks). >> >> One more thing to consider is that for at least some users of >> [math], having the library internally spawn threads and/or peg >> multiple processors may not be desirable. It is a little misleading >> to say that multithreading is the way to get "efficiency." It is >> really the way to *use* more compute resources and unless there are >> real algorithmic improvements, the overall efficiency may actually >> be less, due to task coordination overhead. What you get is faster >> execution due to more greedy utilization of available cores. Actual >> efficiency (how much overall compute resource it takes to complete a >> job) partly depends on how efficiently the coordination itself is >> done (which JDK 1.7 claims to do very well - I have just not seen >> substantiation or any benchmarks demonstrating this) and how the >> parallelization effects overall compute requirements. In any case, >> for environments where library thread-spawning is not desirable, I >> think we should maintain a single-threaded version. >> > > Unless I missed the point, those reasons are exactly why I propose to > have 2 projects/components. One, "Commons-Math", does not fiddle with > resources, while the other would provide a "parallelizationLevel" > setting for the algorithms written to possibly take advantage of the > Java 5+ "task framework". OK, what about the 4.x option? > > Yes, we could still be good by using only Java 5's concurrency > features > but the issue I raise is not only about concurrency but about > evolution/progress/maintenance, all things that require raising > interest > from new contributors (unless it's fine that Commons Math be > tagged as a > "library of the past"...). +1 for experimenting with parallelization. I would just like to understand if the JDK 7 stuff really adds much - in particular, does it handle coordination / cpu allocation better than you could easily do it with 1.5. More supported JDKs == more potential users, so I like to see a real reason to bump the JDK level. > > But using concurrency features in "Commons Math" would also > contradict > your own point ("we should maintain a single-threaded version"): I > agree, > and that's why I proposed this other project... > > As for efficiency (or faster execution, if you want), I don't see the > point in doubting that tasks like global search (e.g. in a genetic > algorithm) will complete in less time when run in parallel... > > As I summarized previously, having a "Commons Math MT" would bring no > inconvenience, contrary to either your points 0, 1, or 2. [No > inconvenience to me, that is, but to people with requirements like > "Java 5 compatible" or "no multi-threading"). > As I indicated, the basic "task" could be defined in "Commons > Math" and > "Commons Math MT" would provide the parallelization "glue" (e.g. > to divide > the search space of the GA). I think it is best at this point to cut a branch and actually start working on specific algorithms. Having a set of candidate algorithms for parallelization will help us decide what we actually need and how it might work. I would personally favor the 4.x approach, with thread-spawning behavior configurable. Phil > > > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org > For additional commands, e-mail: dev-help@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org