Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83E26E53F for ; Fri, 8 Feb 2013 14:21:29 +0000 (UTC) Received: (qmail 85043 invoked by uid 500); 8 Feb 2013 14:21:29 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 84850 invoked by uid 500); 8 Feb 2013 14:21:28 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 84834 invoked by uid 99); 8 Feb 2013 14:21:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Feb 2013 14:21:28 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of phil.steitz@gmail.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Feb 2013 14:21:22 +0000 Received: by mail-ie0-f177.google.com with SMTP id 16so4915935iea.8 for ; Fri, 08 Feb 2013 06:21:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=X3JYv/tDicX/1QSn/AmAVlSvx3MO/ZsfhVO37AHOfVo=; b=KNrL1qpRPVnCJuXo1/Omza0TJFcdTV24lmCz6AXBcwx3akCc3iB4jLp8GjeK9wt2Ej IJ3z3MSNOG4f9T54fyVkPhJ06X5P/92oktWd+PZFfxpehJH3DVDjkAQehDlhc+uWlU3I OJWi/bn4paaOvcKnMT6pyUng+UHWG+lc4DobtqK5RPttCav89XRxYiNL1n/4EUBDU9Uy ftyxjkRLI1ONWMMmXvS14ZgSm+iP8OJ4MzZOjxe3/F9HIwzzvz9oZ8hUNwI1efTRAlUT pTTN63R+AG+3XVaOP4eKSLkmSXlRUbYTJh/BW9hXkbl0lHft/YsmAFWAUhpVU4PdN3C3 rWAQ== X-Received: by 10.42.33.196 with SMTP id j4mr9141365icd.4.1360333262142; Fri, 08 Feb 2013 06:21:02 -0800 (PST) Received: from [10.71.1.5] (70-35-37-2.static.wiline.com. [70.35.37.2]) by mx.google.com with ESMTPS id mj6sm14256850igc.9.2013.02.08.06.20.59 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 08 Feb 2013 06:21:01 -0800 (PST) Message-ID: <511509C8.9070503@gmail.com> Date: Fri, 08 Feb 2013 06:20:56 -0800 From: Phil Steitz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130107 Thunderbird/17.0.2 MIME-Version: 1.0 To: Commons Developers List Subject: Re: [Math] Moving on or not? References: <51127493.1020106@gmail.com> <5112970F.9030705@gmail.com> <5113C1D6.3040206@gmail.com> <48278b4e601bd8eb89c889a46161884e@scarlet.be> <5113D72E.4070004@gmail.com> <7053ecaf56f408359e4e58f31974523e@scarlet.be> <-4789801613352334632@unknownmsgid> <5114B1A3.6000607@free.fr> In-Reply-To: <5114B1A3.6000607@free.fr> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org On 2/8/13 12:04 AM, Luc Maisonobe wrote: > Le 08/02/2013 03:21, Konstantin Berlin a �crit : >> Sorry, but not of this is making sense to me. We had a long discussion >> about how the library doesn't test for large scale problem >> performance. A lot of algorithms probably do not scale well as the >> result. There was talk of dropping sparse support in linear algebra. >> So instead of fixing that, you jump to parallelization, which is >> needed only for large scale problems, which this library does not >> handle well even in single thread right now. >> >> The most significant impact you can have is fixing the linear algebra >> component. > I agree with this. Also in order to avoid spreading our attention too > much on keeping several branches in sync, I would suggest to not create > a new component but directly decide we will not support Java 5 anymore > as of Apache Commons Math 4.0, so people can progressively use the new > features of the language and experiment directly on the trunk. Actually, to get anything, you would need to bump to 1.7, abandoning 1.6 as well. That would effectively mean abandoning a large segment (likely the majority) of the user base. I would not like to do that. So if we don't have the energy to maintain two lines, I would say hold off requiring Java 7 until we have stabilized the API and fixed things like above. Phil > > best regards, > Luc > >> On Feb 7, 2013, at 5:06 PM, Gilles wrote: >> >>> On Thu, 07 Feb 2013 08:32:46 -0800, Phil Steitz wrote: >>>> On 2/7/13 8:04 AM, Gilles wrote: >>>>> On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote: >>>>>> On 2/7/13 4:58 AM, Gilles wrote: >>>>>>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote: >>>>>>>> On 2/6/13 9:03 AM, Gilles wrote: >>>>>>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote: >>>>>>>>>> On 2/5/13 6:08 AM, Gilles wrote: >>>>>>>>>>> Hi. >>>>>>>>>>> >>>>>>>>>>> In the thread about "static import", Stephen noted that >>>>>>>>>>> decisions >>>>>>>>>>> on a >>>>>>>>>>> component's evolution are dependent on whether the future of >>>>>>>>>>> the >>>>>>>>>>> Java >>>>>>>>>>> language is taken into account, or not. >>>>>>>>>>> A question on the same theme also arose after the >>>>>>>>>>> presentation of >>>>>>>>>>> Commons >>>>>>>>>>> Math in FOSDEM 2013. >>>>>>>>>>> >>>>>>>>>>> If we assume that efficiency is among the important >>>>>>>>>>> qualities for >>>>>>>>>>> Commons >>>>>>>>>>> Math, the future is to allow usage of the tools provided by the >>>>>>>>>>> standard >>>>>>>>>>> Java library in order to ease the development of multi-threaded >>>>>>>>>>> algorithms. >>>>>>>>>>> >>>>>>>>>>> Maintaining Java 1.5 source compatibility for the reason >>>>>>>>>>> that we >>>>>>>>>>> may need >>>>>>>>>>> to support legacy applications will turn out to be >>>>>>>>>>> self-defeating: >>>>>>>>>>> 1. New users will not consider Commons Math's features that are >>>>>>>>>>> notably >>>>>>>>>>> apt to parallel processing. >>>>>>>>>>> 2. Current users might at some point simply switch to another >>>>>>>>>>> library if >>>>>>>>>>> it proves more efficient (because it actually uses >>>>>>>>>>> multi-threading). >>>>>>>>>>> 3. New Java developers will be turned away because they will >>>>>>>>>>> want >>>>>>>>>>> to use >>>>>>>>>>> the more convenient features of the language in order to >>>>>>>>>>> provide >>>>>>>>>>> potential contributions. >>>>>>>>>>> >>>>>>>>>>> If maintaining 1.5 source compatibility is kept as a >>>>>>>>>>> requirement, the >>>>>>>>>>> consequence is that Commons Math will _become_ a legacy >>>>>>>>>>> library. >>>>>>>>>>> In that perspective, implementing/improving algorithms for >>>>>>>>>>> which a >>>>>>>>>>> parallel version is known to be more efficient is plainly a >>>>>>>>>>> waste of >>>>>>>>>>> development and maintenance time. >>>>>>>>>>> >>>>>>>>>>> In order to mitigate the risks (both of upgrading and of not >>>>>>>>>>> upgrading >>>>>>>>>>> the source compatibility requirement), I would propose to >>>>>>>>>>> create a >>>>>>>>>>> new >>>>>>>>>>> project (say, "Commons Math MT") where we could implement new >>>>>>>>>>> features[1] >>>>>>>>>>> without being encumbered with the 1.5 requirement.[2] >>>>>>>>>>> The "Commons Math MT" would depend on "Commons Math" where we >>>>>>>>>>> would >>>>>>>>>>> continue developing single-thread (and thread-safe) "tasks", >>>>>>>>>>> i.e. >>>>>>>>>>> independent units of processing that could be used in >>>>>>>>>>> algorithms >>>>>>>>>>> located in "Commons Math MT". >>>>>>>>>>> >>>>>>>>>>> In summary: >>>>>>>>>>> - Commons Math (as usual): >>>>>>>>>>> * single-thread (sequential) algorithms, >>>>>>>>>>> * (pure) Java 5, >>>>>>>>>>> * no dependencies. >>>>>>>>>>> - Commons Math MT: >>>>>>>>>>> * multi-thread (parallel) algorithms, >>>>>>>>>>> * Java 7 and beyond, >>>>>>>>>>> * JNI allowed, >>>>>>>>>>> * dependencies allowed (jCuda). >>>>>>>>>>> >>>>>>>>>>> What do you think? >>>>>>>>>> There are several other possibilities to consider: >>>>>>>>>> >>>>>>>>>> 0) Implement multithreading using JDK 1.5 primitives >>>>>>>>>> 1) Set things up within [math] to support parallel execution in >>>>>>>>>> JDK >>>>>>>>>> 1.7, Hadoop or other frameworks >>>>>>>>>> 2) Instead of a new project, start a 4.x branch targeting JDK >>>>>>>>>> 1.7 >>>>>>>>>> >>>>>>>>>> I think we should maintain a version that has no dependencies >>>>>>>>>> and no >>>>>>>>>> JNI in any case. >>>>>>>>>> >>>>>>>>>> Starting a branch and getting concrete about how to parallelize >>>>>>>>>> some >>>>>>>>>> algorithms would be a good way to start. One thing I have not >>>>>>>>>> really investigated and would be interested in details on is >>>>>>>>>> what >>>>>>>>>> you actually get in efficiency gain (or loss?) using fork / >>>>>>>>>> join vs >>>>>>>>>> just using 1.5+ concurrency for the kinds of problems we >>>>>>>>>> would end >>>>>>>>>> up using this stuff for. >>>>>>>>>> >>>>>>>>>> Thinking about specific parallelization problem instances would >>>>>>>>>> also >>>>>>>>>> help decide whether 1) makes sense (i.e., whether it makes >>>>>>>>>> sense as >>>>>>>>>> you mention above to maintain a single-threaded library that >>>>>>>>>> provides task execution for a multithreaded version or >>>>>>>>>> multithreaded >>>>>>>>>> frameworks). >>>>>>>>>> >>>>>>>>>> One more thing to consider is that for at least some users of >>>>>>>>>> [math], having the library internally spawn threads and/or peg >>>>>>>>>> multiple processors may not be desirable. It is a little >>>>>>>>>> misleading >>>>>>>>>> to say that multithreading is the way to get "efficiency." >>>>>>>>>> It is >>>>>>>>>> really the way to *use* more compute resources and unless there >>>>>>>>>> are >>>>>>>>>> real algorithmic improvements, the overall efficiency may >>>>>>>>>> actually >>>>>>>>>> be less, due to task coordination overhead. What you get is >>>>>>>>>> faster >>>>>>>>>> execution due to more greedy utilization of available cores. >>>>>>>>>> Actual >>>>>>>>>> efficiency (how much overall compute resource it takes to >>>>>>>>>> complete a >>>>>>>>>> job) partly depends on how efficiently the coordination >>>>>>>>>> itself is >>>>>>>>>> done (which JDK 1.7 claims to do very well - I have just not >>>>>>>>>> seen >>>>>>>>>> substantiation or any benchmarks demonstrating this) and how the >>>>>>>>>> parallelization effects overall compute requirements. In any >>>>>>>>>> case, >>>>>>>>>> for environments where library thread-spawning is not >>>>>>>>>> desirable, I >>>>>>>>>> think we should maintain a single-threaded version. >>>>>>>>> Unless I missed the point, those reasons are exactly why I >>>>>>>>> propose to >>>>>>>>> have 2 projects/components. One, "Commons-Math", does not fiddle >>>>>>>>> with >>>>>>>>> resources, while the other would provide a "parallelizationLevel" >>>>>>>>> setting for the algorithms written to possibly take advantage of >>>>>>>>> the >>>>>>>>> Java 5+ "task framework". >>>>>>>> OK, what about the 4.x option? >>>>>>>>> Yes, we could still be good by using only Java 5's concurrency >>>>>>>>> features >>>>>>>>> but the issue I raise is not only about concurrency but about >>>>>>>>> evolution/progress/maintenance, all things that require raising >>>>>>>>> interest >>>>>>>>> from new contributors (unless it's fine that Commons Math be >>>>>>>>> tagged as a >>>>>>>>> "library of the past"...). >>>>>>>> +1 for experimenting with parallelization. I would just like to >>>>>>>> understand if the JDK 7 stuff really adds much - in particular, >>>>>>>> does >>>>>>>> it handle coordination / cpu allocation better than you could >>>>>>>> easily >>>>>>>> do it with 1.5. More supported JDKs == more potential users, so I >>>>>>>> like to see a real reason to bump the JDK level. >>>>>>>>> But using concurrency features in "Commons Math" would also >>>>>>>>> contradict >>>>>>>>> your own point ("we should maintain a single-threaded >>>>>>>>> version"): I >>>>>>>>> agree, >>>>>>>>> and that's why I proposed this other project... >>>>>>>>> >>>>>>>>> As for efficiency (or faster execution, if you want), I don't >>>>>>>>> see the >>>>>>>>> point in doubting that tasks like global search (e.g. in a >>>>>>>>> genetic >>>>>>>>> algorithm) will complete in less time when run in parallel... >>>>>>>>> >>>>>>>>> As I summarized previously, having a "Commons Math MT" would >>>>>>>>> bring no >>>>>>>>> inconvenience, contrary to either your points 0, 1, or 2. [No >>>>>>>>> inconvenience to me, that is, but to people with requirements >>>>>>>>> like >>>>>>>>> "Java 5 compatible" or "no multi-threading"). >>>>>>>>> As I indicated, the basic "task" could be defined in "Commons >>>>>>>>> Math" and >>>>>>>>> "Commons Math MT" would provide the parallelization "glue" (e.g. >>>>>>>>> to divide >>>>>>>>> the search space of the GA). >>>>>>>> I think it is best at this point to cut a branch and actually >>>>>>>> start >>>>>>>> working on specific algorithms. Having a set of candidate >>>>>>>> algorithms for parallelization will help us decide what we >>>>>>>> actually >>>>>>>> need and how it might work. I would personally favor the 4.x >>>>>>>> approach, with thread-spawning behavior configurable. >>>>>>> It seems fair to wait until parallel algorithms are actually >>>>>>> implemented. >>>>>>> >>>>>>> However it is not clear what you mean with "the 4.x approach": if >>>>>>> it is >>>>>>> actually allowing Java 7, that would mean that, starting from 4.0, >>>>>>> we'll >>>>>>> indeed drop support of earlier JVMs! >>>>>>> Why would this be preferred to having 2 projects? Of course, if >>>>>>> everyone >>>>>>> agrees to that move to Java 7, that's fine. :-) >>>>>> What I meant was that instead of creating a new component, we would >>>>>> just create a new release line. Like what tomcat does for servlet >>>>>> spec versions. I guess this does mean that we end up having to >>>>>> stabilize the 3.x APIs because no additional "major" release would >>>>>> be allowed in that line. That would be a *good thing* IMO as long >>>>>> as we can do it cleanly. If not, maybe we end up having to use 5.x >>>>>> for the JDK 1.7+ version, using 4.0 to get to a stable API for the >>>>>> current trunk code. >>>>> There's a still the human resource problem: we don't have it to >>>>> maintain >>>>> a single branch; having two will only make it worse. >>>> Yes, but the "new project" approach has the same problem. >>> Yes. >>> However, I meant it as a way to separate concerns, as shown >>> by diverging opinions, even in the few people who take part >>> in this discussion or in previous ones about the same subject. >>> >>> A sibling (not separate!) project could allow interested >>> people to experiment while not adding yet another "distraction" >>> to the main project, where people more focused on the >>> mathematical (for lack of a better word) side can continue >>> their own improvements. >>> A healthy interaction could even come out of having a "public" >>> use-case in the form of a project that needs certain facilities >>> (algorithms as tasks) in order to provide multi-thread >>> utilities to users (who might prefer not to have to implement >>> them themselves at a higher level). >>> >>>>>>> On the other hand, if we keep Java 5, at least until we get use >>>>>>> cases or >>>>>>> contributions that would benefit from features in JDKs newer than >>>>>>> 1.5, >>>>>>> there is no need to create a branch; we can just go on with adding >>>>>>> multi-thread codes to the trunk (to become part[1] of the upcoming >>>>>>> 3.x >>>>>>> releases). >>>>>> That is why I wanted to get a feel for what the JDK 1.7 stuff really >>>>>> buys you. Has anyone seen benchmarks showing better performance >>>>>> using 1.7 than can be obtained just using 1.5 concurrency >>>>>> primitives? >>>>> Again, there are separate issues: >>>>> 1. Coding in Java 7 >>>>> 2. Running with the JVM shipped with JDK 1.7 >>>>> >>>>> The newer JVMs are faster, independently of whether new features >>>>> of the >>>>> language are used. >>>>> But it could well be that some of the new features allow even better >>>>> performance (as is foreseen for Java 8). >>>> Agreed. I am interested in understanding better both how much >>>> easier it actually is to code and whether the 1.7 framework >>>> materially improves scheduling / allocation over what you could do >>>> just using 1.5 primitives. >>> I cannot provide proof, but nor is anyone on this list >>> eager to prove the contrary; hence the proposal to set >>> up a "playground". >>> >>>>>> Has anyone used 1.7 to parallelize numerical algorithms >>>>>> and found it really easier / more performant? >>>>> Where are those people who could answer? >>>> This is a public list :) >>>>> That is one of the points I raised. If we maintain source >>>>> compatibility >>>>> with a language version that is 9 years old, not many contributors >>>>> are >>>>> going to be interested. Thus reducing the chance to get answers... >>>>> >>>>>> Any opinions / >>>>>> responses to Konstantin's comment on where parallelization should be >>>>>> implemented - i.e. in the library vs somewhere up the stack? >>>>> What was the _question_? ... >>>> The question he implicitly raised was whether or not it makes sense >>>> for a low-level library to parallelize tasks / run across cores. >>> In several areas, CM is not a low-level library (GA, multi-start >>> optimizers for example). In other areas like FFT, a user can >>> legitimately expect top performance without having to handle >>> parallelization by himself. >>> >>>> This is a legitimate question. It may be better actually to set >>>> things up so that higher-level frameworks or applications can >>>> arrange parallel execution rather than embedding it in the low-level >>>> library itself. This is also what I was referring to when I said >>>> that in some contexts, thread-spawning / cpu hogging may not be >>>> desirable. >>> For several cases (GA, FFT, multi-start optimizers), I have the >>> opposite viewpoint: multi-threading is a implementation detail, >>> that could be handled at a _lower_ level. Of course, the user can >>> decide whether to enable more than one thread. >>> >>>>>> Any >>>>>> ideas how to set things up so that [math] code can play nicely with >>>>>> concurrency frameworks? >>>>> That's a strange question in the context of a project that tries hard >>>>> not to have any dependency. >>>> I did not mean necessarily to bring in dependencies; but rather to >>>> make it easy for computational tasks executed by [math] code to be >>>> managed by external concurrency frameworks, e.g. Hadoop. >>> In the context of Commons Math, we often heard that "no dependency" >>> is good. Then, it is also good to not impose _implicit_ dependencies >>> (like: "If you use Hadoop, you could have better performance"). In a >>> way, the CM development "model" is: "We provide a toolkit of efficient >>> procedures, and you, the user, get top performance (on a best effort >>> basis of course)." >>> If we can provide better performance through multi-threading, why not? >>> Nobody will be forced to use it: they will use the "basic" (sequential) >>> tasks, or set the "parallelizationLevel" setting to 1. >>> >>> Gilles >>> >>>> Phil >>>>> If the requirement is to only depend on the standard JDK: the >>>>> framework >>>>> is in >>>>> java.util.concurrent >>>>> and all we need to do is to define "tasks" that can be "submitted to >>>>> an executor: >>>>> >>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/AbstractExecutorService.html#submit(java.util.concurrent.Callable) >>>>> >>>>> >>>>> Regards, >>>>> Gilles >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org >>> For additional commands, e-mail: dev-help@commons.apache.org >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org >> For additional commands, e-mail: dev-help@commons.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org > For additional commands, e-mail: dev-help@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org