Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 16E5A200A5B for ; Wed, 25 May 2016 17:08:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 15B7A160A29; Wed, 25 May 2016 15:08:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B4F13160A0F for ; Wed, 25 May 2016 17:08:49 +0200 (CEST) Received: (qmail 96168 invoked by uid 500); 25 May 2016 15:08:49 -0000 Mailing-List: contact dev-help@systemml.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.incubator.apache.org Delivered-To: mailing list dev@systemml.incubator.apache.org Received: (qmail 96157 invoked by uid 99); 25 May 2016 15:08:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2016 15:08:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2CA9EC0E1D for ; Wed, 25 May 2016 15:08:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.021 X-Spam-Level: X-Spam-Status: No, score=-2.021 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LIVE=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, TVD_FW_GRAPHIC_NAME_MID=0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id EhZGdrXDS2s9 for ; Wed, 25 May 2016 15:08:44 +0000 (UTC) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 3546160E1F for ; Wed, 25 May 2016 15:08:43 +0000 (UTC) Received: from localhost by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 25 May 2016 09:08:40 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 25 May 2016 09:08:33 -0600 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: npansar@us.ibm.com X-IBM-RcptTo: dev@systemml.incubator.apache.org Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 6C3893E40055 for ; Wed, 25 May 2016 09:08:28 -0600 (MDT) Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u4PF8SZP46530694 for ; Wed, 25 May 2016 08:08:28 -0700 Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 36D636A041 for ; Wed, 25 May 2016 09:08:28 -0600 (MDT) Received: from d50lp01.ny.us.ibm.com (unknown [146.89.104.207]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTPS id 3A3E96A05A for ; Wed, 25 May 2016 09:08:27 -0600 (MDT) Received: from localhost by d50lp01.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 25 May 2016 11:08:25 -0400 Received: from smtp.notes.na.collabserv.com (192.155.248.66) by d50lp01.ny.us.ibm.com (158.87.18.20) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256/256) Wed, 25 May 2016 11:08:23 -0400 X-IBM-Helo: smtp.notes.na.collabserv.com X-IBM-MailFrom: npansar@us.ibm.com X-IBM-RcptTo: dev@systemml.incubator.apache.org Received: from /spool/local by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Wed, 25 May 2016 15:08:21 -0000 Received: from us1a3-smtp05.a3.dal06.isc4sb.com (10.146.71.159) by smtp.notes.na.collabserv.com (10.106.227.127) with smtp.notes.na.collabserv.com ESMTP; Wed, 25 May 2016 15:08:20 -0000 Received: from us1a3-mail56.a3.dal09.isc4sb.com ([10.142.3.44]) by us1a3-smtp05.a3.dal06.isc4sb.com with ESMTP id 2016052515081766-347364 ; Wed, 25 May 2016 15:08:17 +0000 MIME-Version: 1.0 In-Reply-To: <20160525130338.930C7B2050@b01ledav03.gho.pok.ibm.com> Subject: Re: Discussion on GPU backend To: dev@systemml.incubator.apache.org From: "Niketan Pansare" Date: Wed, 25 May 2016 08:08:18 -0700 References: <201605032026.u43KQOxp030340@d01av01.pok.ibm.com><201605180550.u4I5ohtP012606@d01av03.pok.ibm.com><201605181755.u4IHt6mO010963@d01av01.pok.ibm.com> <20160524185503.D47A6AE04B@b01ledav005.gho.pok.ibm.com> <20160525041008.A9E3428046@b01ledav001.gho.pok.ibm.com> <20160525130338.930C7B2050@b01ledav03.gho.pok.ibm.com> X-KeepSent: F00842B0:66B25A31-00257FBE:0051FEA9; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1FP5 SHF106 December 12, 2015 X-LLNOutbound: False X-Disclaimed: 63183 X-TNEFEvaluated: 1 Content-type: multipart/related; Boundary="0__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839" x-cbid: 16052515-0013-0000-0000-0000421064FF X-IBM-ISS-SpamDetectors: Score=0.397557; BY=0.274754; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.397557; ST=0; TS=0; UL=0; ISC= X-IBM-ISS-DetailInfo: BY=3.00005302; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000167; SDB=6.00707193; UDB=6.00328227; UTC=2016-05-25 15:08:21 x-cbparentid: 16052515-9588-0000-0000-000003C33E58 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused Message-Id: <20160525150827.3A3E96A05A@b03ledav003.gho.boulder.ibm.com> X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused archived-at: Wed, 25 May 2016 15:08:51 -0000 --0__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839 Content-type: multipart/alternative; Boundary="1__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839" --1__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839 Content-Transfer-Encoding: quoted-printable Content-type: text/plain; charset=US-ASCII Thanks Berthold and Matthias for your suggestions. It is important to note whether we go with (A) or (B), the initial PR will be squashed in one commit and individual commits by external contributor will be lost in the process. However, since we are planning to go with option (3), the impact won't be too severe. Matthias: Here are my thoughts regarding the unknowns for GPU backend: 1. Handling of native libraries: Both JCuda and Nvidia provide shared libraries/DLL for most OS/platforms along with installation instructions. For deployment: As per the previous email, the native libraries will be treated as an external dependency, just like hadoop/spark. For example: if someone executes: "hadoop jar SystemML.jar -f test.dml -exec hybrid=5Fspark", she will get "Class Not Found" exception. In similar fashion, if the user doesnot include JCu*.jar or provide native libraries (JCu*.dll/so or CUDA or CuDNN) and supplies "-accelerator" flag, a "Class not found" or "Cannot load .." exception will be thrown respectively. If user doesnot supply "-accelerator" flag, SystemML will proceed will normal execution as it does today. For dev: We are planning to host jcu*.jar into one of maven repository. Once that's done, the "system" scope in pom will be replaced by "provided" scope and the jcu*.jars will be deleted from PR. Like deployment, it is responsibility of the developer to install native libraries if she intends to work on GPU backend. For testing: The user can set the environment variable "CUDA=5FPATH" and set TEST=5FGPU = flag to enable GPU tests (Please see https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c= 3ff62cb2648acbbd19f61aR113 ). The PR will be accompanied by additional tests which will be enabled only when TEST=5FGPU is set. Having TEST=5FGPU flag allows users without Nv= idia GPU to run the integration test. Like deployment, it is responsibility of the developer to install native libraries for testing with TEST=5FGPU flag. The first version will not contain custom native kernels. 2. I can add the summary of the performance comparisons in the PR :) Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar From: Berthold Reinwald/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 05/25/2016 06:03 AM Subject: Re: Discussion on GPU backend the discussion is less about (1), (2), or (3). As practiced so far, (3) is the way to go. The question is about (A) or (B). Curious was the Apache suggested practice is. Regards, Berthold Reinwald IBM Almaden Research Center office: (408) 927 2208; T/L: 457 2208 e-mail: reinwald@us.ibm.com From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 05/24/2016 09:10 PM Subject: Re: Discussion on GPU backend Generally, I think we should really stick to (3) as done in the past, i.e., bring up major features in the roadmap discussions, create jira epics and try to break them into rather isolated tasks. This works for almost any major/minor feature. The only exception are features, where it is initially unknown if the potential benefits outweigh the increased complexity (or other disadvantages). Here, prototypes are required but everybody should be free to choose a way of maintaining them. I also don't expect too much collaboration here because of the unknown status. Once the initial unknowns are resolved, we should come back to (3) tough. Regarding the GPU backend, the unknowns to resolve are (1) the handling of native libraries/kernels for deployment/test/dev, and (2) performance comparisons on selected algorithms (prototypes, not fully integrated), data sizes, and platforms. Once we have answers to these questions, we can create all the tasks for optimizer/runtime integration. Regards, Matthias Niketan Pansare---05/24/2016 11:55:19 AM---Hi all, Since there is interest in collaborating on GPU backend, I wanted to know From: Niketan Pansare/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 05/24/2016 11:55 AM Subject: Re: Discussion on GPU backend Hi all, Since there is interest in collaborating on GPU backend, I wanted to know what is the preferred way to go ahead with a new feature (i.e. GPU backend) ? This discussion is also generally applicable to other major features (for example: Flink backend, Deep Learning support, OpenCL backend, new data types, new built-in functions, new algorithms, etc). The first point of discussion is what would qualify as a "major feature" and how we integrate it into SystemML ? Here are three options that could serve as a potential requirement: 1. The feature has to be fully functional and fully optimized. For example: in the case of additional backends, the PR can only be merged in if and only if, all the instructions (CP or distributed) has been implemented and is at least as optimized as our existing alternate backends. In the case of algorithms or the built-in functions, the PR can only be merged in if and only if, it runs on all the backends for all datasets and is comparable in performance and accuracy with an external ML libraries. 2. The feature has to be fully functional. In this case, the PR can only be merged in if and only if all the instructions (CP or distributed) has been implemented. However, the first version of the new backend need not perform faster than our existing alternate backends. 3. Increment addition but with unit testcases that addresses quality and stability concerns. In this case, a PR can be merged if a subset of instructions has been implemented along with set of unit test cases suggested by our committers. The main benefit here is quick-feedback iterations from our committers, whereas the main drawback is an intermediate state where we don't fully support the given backend for certain scenario. If we decide to go with option 1 or 2, then potentially there will be a lot of code to review at the end and ideally we should give opportunity for our committers to provide early review comments on the feature. This will mitigate the risk of having to re-implement the entire feature. The options here are: A. Create a branch on https://github.com/apache/incubator-systemml. This allows people to collaborate as well as allows committers to look at the code. B. Create a branch on a fork and have a PR up to allow committers to raise concerns and provide suggestions. This is done for https://github.com/apache/incubator-systemml/pull/165 and https://github.com/apache/incubator-systemml/pull/119. To collaborate, the person creating PR will act as committer for the feature and will accept PR on its branch and will be responsible for resolving conflicts and keeping the PR in sync with the master. If we decide to go with the option 3 (i.e. incremental addition), the option B seems to be logical choice as we already do this for other features. My goal here is not to create a formal process but instead to avoid any potential misunderstanding/confusion and also to follow recommended Apache practices. Please email back with your thoughts :) Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar Deron Eriksson ---05/18/2016 11:22:26 AM---Hi Niketan, Good idea, I think that would be the cleanest solution for now. Since JCuda From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 05/18/2016 11:22 AM Subject: Re: Discussion on GPU backend Hi Niketan, Good idea, I think that would be the cleanest solution for now. Since JCuda doesn't appear to be in a public maven repo, it adds a layer of difficulty to clean integration via maven builds. Deron On Wed, May 18, 2016 at 10:55 AM, Niketan Pansare wrote: > Hi Deron, > > Good points. I vote that we keep JCUDA and other accelerators we add as an > external dependency. This means the user will have to ensure JCuda.jar in > the class path and JCuda.DLL/JCuda.so in the LD=5FLIBRARY=5FPATH. > > I don't think JCuda.jar is platform-specific. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar > > [image: Inactive hide details for Deron Eriksson ---05/18/2016 10:51:17 > AM---Hi, I'm wondering what would be a good way to handle JCuda]Deron > Eriksson ---05/18/2016 10:51:17 AM---Hi, I'm wondering what would be a good > way to handle JCuda in terms of the > > From: Deron Eriksson > To: dev@systemml.incubator.apache.org > Date: 05/18/2016 10:51 AM > Subject: Re: Discussion on GPU backend > ------------------------------ > > > > Hi, > > I'm wondering what would be a good way to handle JCuda in terms of the > build release packages. Currently we have 11 artifacts that we are > building: > systemml-0.10.0-incubating-SNAPSHOT-inmemory.jar > systemml-0.10.0-incubating-SNAPSHOT-javadoc.jar > systemml-0.10.0-incubating-SNAPSHOT-sources.jar > systemml-0.10.0-incubating-SNAPSHOT-src.tar.gz > systemml-0.10.0-incubating-SNAPSHOT-src.zip > systemml-0.10.0-incubating-SNAPSHOT-standalone.jar > systemml-0.10.0-incubating-SNAPSHOT-standalone.tar.gz > systemml-0.10.0-incubating-SNAPSHOT-standalone.zip > systemml-0.10.0-incubating-SNAPSHOT.jar > systemml-0.10.0-incubating-SNAPSHOT.tar.gz > systemml-0.10.0-incubating-SNAPSHOT.zip > > It looks like JCuda is platform-specific, so you typically need different > jars/dlls/sos/etc for each platform. If I'm understanding things correctly, > if we generated Windows/Linux/LinuxPowerPC/MacOS-specific SystemML > artifacts for JCuda, we'd potentially have an enormous number of artifacts. > > Is this something that could be potentially handled by specific profiles in > the pom so that a user might be able to do something like "mvn clean > package -P jcuda-windows" so that a user could be responsible for building > the platform-specific SystemML jar for jcuda? Or is this something that > could be handled differently, by putting the platform-specific jcuda jar on > the classpath and any dlls or other needed libraries on the path? > > Deron > > > > On Tue, May 17, 2016 at 10:50 PM, Niketan Pansare > wrote: > > > Hi Luciano, > > > > Like all our backends, there is no change in the programming model. The > > user submits a DML script and specifies whether she wants to use an > > accelerator. Assuming that we compile jcuda jars into SystemML.jar, the > > user can use GPU backend using following command: > > spark-submit --master yarn-client ... -f MyAlgo.dml -accelerator -exec > > hybrid=5Fspark > > > > The user also needs to set LD=5FLIBRARY=5FPATH that points to JCuda DLL= or so > > files. Please see *https://issues.apache.org/jira/browse/SPARK-1720* > > ... For example: the > > > user can add following to spark-env.sh > > export LD=5FLIBRARY=5FPATH=3D:$LD=5FLIBRARY=5FPATH > > > > The first version of GPU backend will only accelerate CP. In this case, > we > > have four types of instructions: > > 1. CP > > 2. GPU (requires GPU on the driver) > > 3. SPARK > > 4. MR > > > > Note, the first version will require the CUDA/JCuda dependency to be > > installed on the driver only. > > > > The next version will accelerate our distributed instructions as well. In > > this case, we will have six types of instructions: > > 1. CP > > 2. GPU > > 3. SPARK > > 4. MR > > 5. SPARK-GPU (requires GPU cluster) > > 6. MR-GPU (requires GPU cluster) > > > > Thanks, > > > > Niketan Pansare > > IBM Almaden Research Center > > E-mail: npansar At us.ibm.com > > > http://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar > > > > > [image: Inactive hide details for Luciano Resende ---05/17/2016 09:13:24 > > PM---Great to see detailed information on this topic Niketan,]Luciano > > Resende ---05/17/2016 09:13:24 PM---Great to see detailed information on > > this topic Niketan, I guess I have missed when you posted it in > > > > From: Luciano Resende > > To: dev@systemml.incubator.apache.org > > Date: 05/17/2016 09:13 PM > > Subject: Re: Discussion on GPU backend > > ------------------------------ > > > > > > > > > Great to see detailed information on this topic Niketan, I guess I have > > missed when you posted it initially. > > > > Could you elaborate a little more on what is the programming model for > when > > the user wants to leverage GPU ? Also, today I can submit a job to spark > > using --jars and it will handle copying the dependencies to the worker > > nodes. If my application wants to leverage GPU, what extras dependencies > > will be required on the worker nodes, and how they are going to be > > installed/updated on the Spark cluster ? > > > > > > > > On Tue, May 3, 2016 at 1:26 PM, Niketan Pansare > > wrote: > > > > > > > > > > > Hi all, > > > > > > I have updated the design document for our GPU backend in the JIRA > > > > https://issues.apache.org/jira/browse/SYSTEMML-445. The implementation > > > > details are based on the prototype I created and is available in PR > > > > https://github.com/apache/incubator-systemml/pull/131. Once we are done > > > > with the discussion, I can clean up and separate out the GPU backend > in a > > > separate PR for easier review :) > > > > > > Here are key design points: > > > A GPU backend would implement two abstract classes: > > > 1. GPUContext > > > 2. GPUObject > > > > > > > > > > > > The GPUContext is responsible for GPU memory management and gets > > call-backs > > > from SystemML's bufferpool on following methods: > > > 1. void acquireRead(MatrixObject mo) > > > 2. void acquireModify(MatrixObject mo) > > > 3. void release(MatrixObject mo, boolean isGPUCopyModified) > > > 4. void exportData(MatrixObject mo) > > > 5. void evict(MatrixObject mo) > > > > > > > > > > > > A GPUObject (like RDDObject and BroadcastObject) is stored in > > CacheableData > > > object. It contains following methods that are called back from the > > > corresponding GPUContext: > > > 1. void allocateMemoryOnDevice() > > > 2. void deallocateMemoryOnDevice() > > > 3. long getSizeOnDevice() > > > 4. void copyFromHostToDevice() > > > 5. void copyFromDeviceToHost() > > > > > > > > > > > > In the initial implementation, we will add JCudaContext and > JCudaPointer > > > that will extend the above abstract classes respectively. The > > JCudaContext > > > will be created by ExecutionContextFactory depending on the > > user-specified > > > accelarator. Analgous to MR/SPARK/CP, we will add a new ExecType: GPU > and > > > implement GPU instructions. > > > > > > The above design is general enough so that other people can implement > > > custom accelerators (for example: OpenCL) and also follows the design > > > principles of our CP bufferpool. > > > > > > Thanks, > > > > > > Niketan Pansare > > > IBM Almaden Research Center > > > E-mail: npansar At us.ibm.com > > > > http://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar > > > > > > > > > > > > -- > > Luciano Resende > > http://twitter.com/lresende1975 > > http://lresende.blogspot.com/ > > > > > > > > > > > > --1__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839 Content-Transfer-Encoding: quoted-printable Content-type: text/html; charset=US-ASCII Content-Disposition: inline

Thanks Berthold and Matthias for your suggestions. It is imp= ortant to note whether we go with (A) or (B), the initial PR will be squash= ed in one commit and individual commits by external contributor will be los= t in the process. However, since we are planning to go with option (3), the= impact won't be too severe.

Matthias: Here are my thoughts regardin= g the unknowns for GPU backend:
1. Handling of native libraries:
Both= JCuda and Nvidia provide shared libraries/DLL for most OS/platforms along = with installation instructions.

For deployment:
As per the previo= us email, the native libraries will be treated as an external dependency, j= ust like hadoop/spark. For example: if someone executes: "hadoop jar S= ystemML.jar -f test.dml -exec hybrid=5Fspark", she will get "Clas= s Not Found" exception. In similar fashion, if the user doesnot includ= e JCu*.jar or provide native libraries (JCu*.dll/so or CUDA or CuDNN) and s= upplies "-accelerator" flag, a "Class not found" or &qu= ot;Cannot load .." exception will be thrown respectively. If user does= not supply "-accelerator" flag, SystemML will proceed will normal= execution as it does today.

For dev:
We are planning to host jc= u*.jar into one of maven repository. Once that's done, the "system&quo= t; scope in pom will be replaced by "provided" scope and the jcu*= .jars will be deleted from PR. Like deployment, it is responsibility of the= developer to install native libraries if she intends to work on GPU backen= d.

For testing:
The user can set the environment variable "C= UDA=5FPATH" and set TEST=5FGPU flag to enable GPU tests (Please see https://github.com/apache/incubator-sy= stemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113). The P= R will be accompanied by additional tests which will be enabled only when T= EST=5FGPU is set. Having TEST=5FGPU flag allows users without Nvidia GPU to= run the integration test. Like deployment, it is responsibility of the dev= eloper to install native libraries for testing with TEST=5FGPU flag.
The first version will not contain custom native kernels.

2. I ca= n add the summary of the performance comparisons in the PR :)

Thanks= ,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar = At us.ibm.com
http://researcher.watson.ibm.com/researcher/view= .php?person=3Dus-npansar

3D=Berthold Reinwald---05/25/2016 06:03:55 AM---the discussion is less abo= ut (1), (2), or (3). As practiced so far, (3) is the way to go.
=
From: Berthold Reinwald/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org=
Date: 05/25/2016 06:03 AM
Su= bject: Re: Discussion on GPU backend<= br>





the discussion is less about (1), (2), or (3). As pr= acticed so far, (3) is
the way to go.

The question is about (A) = or (B). Curious was the Apache suggested
practice is.

Regards,Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 220= 8; T/L: 457 2208
e-mail: reinwald@us.ibm.com



From:  = Matthias Boehm/Almaden/IBM@IBMUS
To:     dev@systemml.incubat= or.apache.org
Date:   05/24/2016 09:10 PM
Subject:    =    Re: Discussion on GPU backend



Generally, I thi= nk we should really stick to (3) as done in the past,
i.e., bring up ma= jor features in the roadmap discussions, create jira
epics and try to b= reak them into rather isolated tasks. This works for
almost any major/m= inor feature. The only exception are features, where it
is initially un= known if the potential benefits outweigh the increased
complexity (or o= ther disadvantages). Here, prototypes are required but
everybody should= be free to choose a way of maintaining them. I also don't
expect too m= uch collaboration here because of the unknown status. Once the
initial = unknowns are resolved, we should come back to (3) tough.

Regarding t= he GPU backend, the unknowns to resolve are (1) the handling of
native = libraries/kernels for deployment/test/dev, and (2) performance
comparis= ons on selected algorithms (prototypes, not fully integrated),
data siz= es, and platforms. Once we have answers to these questions, we can
crea= te all the tasks for optimizer/runtime integration.

Regards,
Mat= thias


Niketan Pansare---05/24/2016 11:55:19 AM---Hi all, Since = there is interest
in collaborating on GPU backend, I wanted to know
=
From: Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.a= pache.org
Date: 05/24/2016 11:55 AM
Subject: Re: Discussion on GPU ba= ckend



Hi all,

Since there is interest in collaboratin= g on GPU backend, I wanted to know
what is the preferred way to go ahea= d with a new feature (i.e. GPU
backend) ? This discussion is also gener= ally applicable to other major
features (for example: Flink backend, De= ep Learning support, OpenCL
backend, new data types, new built-in funct= ions, new algorithms, etc).

The first point of discussion is what wo= uld qualify as a "major feature"
and how we integrate it into= SystemML ? Here are three options that could
serve as a potential requ= irement:
1. The feature has to be fully functional and fully optimized. = For
example: in the case of additional backends, the PR can only be mer= ged in
if and only if, all the instructions (CP or distributed) has bee= n
implemented and is at least as optimized as our existing alternate backends. In the case of algorithms or the built-in functions, the PR can=
only be merged in if and only if, it runs on all the backends for all =
datasets and is comparable in performance and accuracy with an external= ML
libraries.
2. The feature has to be fully functional. In this ca= se, the PR can only
be merged in if and only if all the instructions (C= P or distributed) has
been implemented. However, the first version of t= he new backend need not
perform faster than our existing alternate back= ends.
3. Increment addition but with unit testcases that addresses quali= ty and
stability concerns. In this case, a PR can be merged if a subset= of
instructions has been implemented along with set of unit test cases=
suggested by our committers. The main benefit here is quick-feedback <= br>iterations from our committers, whereas the main drawback is an
inte= rmediate state where we don't fully support the given backend for
certa= in scenario.

If we decide to go with option 1 or 2, then potentiall= y there will be a
lot of code to review at the end and ideally we shoul= d give opportunity
for our committers to provide early review comments = on the feature. This
will mitigate the risk of having to re-implement t= he entire feature. The
options here are:
A. Create a branch on
= https://github= .com/apache/incubator-systemml. This
allows people to coll= aborate as well as allows committers to look at the
code.
B. Create = a branch on a fork and have a PR up to allow committers to raise
concer= ns and provide suggestions. This is done for
https://github.com/apache/= incubator-systemml/pull/165 and
https://github.com/ap= ache/incubator-systemml/pull/119. To collaborate, the
pers= on creating PR will act as committer for the feature and will accept
PR= on its branch and will be responsible for resolving conflicts and
keep= ing the PR in sync with the master.

If we decide to go with the opti= on 3 (i.e. incremental addition), the
option B seems to be logical choi= ce as we already do this for other
features.

My goal here is not= to create a formal process but instead to avoid any
potential misunder= standing/confusion and also to follow recommended Apache
practices.
=
Please email back with your thoughts :)

Thanks,

Niketan P= ansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
<= /tt>http://researcher.watson.ibm.com/researcher/view.php?pers= on=3Dus-npansar

Deron Eriksson ---05/18/2016 11:22:26 A= M---Hi Niketan, Good idea, I think
that would be the cleanest solution = for now. Since JCuda

From: Deron Eriksson <deroneriksson@gmail.co= m>
To: dev@systemml.incubator.apache.org
Date: 05/18/2016 11:22 AM=
Subject: Re: Discussion on GPU backend



Hi Niketan,
Good idea, I think that would be the cleanest solution for now. Since JCuda
doesn't appear to be in a public maven repo, it adds a layer of d= ifficulty
to clean integration via maven builds.

Deron

On Wed, May 18, 2016 at 10:55 AM, Niketan Pansare <npansar@us.ibm.com&g= t;
wrote:

> Hi Deron,
>
> Good points. I vote that= we keep JCUDA and other accelerators we add as
an
> external dep= endency. This means the user will have to ensure JCuda.jar
in
> t= he class path and JCuda.DLL/JCuda.so in the LD=5FLIBRARY=5FPATH.
>> I don't think JCuda.jar is platform-specific.
>
> Thanks,=
>
> Niketan Pansare
> IBM Almaden Research Center
>= ; E-mail: npansar At us.ibm.com
>
http://researche= r.watson.ibm.com/researcher/view.php?person=3Dus-npansar
&g= t;
> [image: Inactive hide details for Deron Eriksson ---05/18/2016 1= 0:51:17
> AM---Hi, I'm wondering what would be a good way to handle J= Cuda]Deron
> Eriksson ---05/18/2016 10:51:17 AM---Hi, I'm wondering w= hat would be a
good
> way to handle JCuda in terms of the
>=
> From: Deron Eriksson <deroneriksson@gmail.com>
> To: d= ev@systemml.incubator.apache.org
> Date: 05/18/2016 10:51 AM
> = Subject: Re: Discussion on GPU backend
> ----------------------------= --
>
>
>
> Hi,
>
> I'm wondering what w= ould be a good way to handle JCuda in terms of the
> build release pa= ckages. Currently we have 11 artifacts that we are
> building:
>= ;   systemml-0.10.0-incubating-SNAPSHOT-inmemory.jar
>   sy= stemml-0.10.0-incubating-SNAPSHOT-javadoc.jar
>   systemml-0.10.= 0-incubating-SNAPSHOT-sources.jar
>   systemml-0.10.0-incubating= -SNAPSHOT-src.tar.gz
>   systemml-0.10.0-incubating-SNAPSHOT-src= .zip
>   systemml-0.10.0-incubating-SNAPSHOT-standalone.jar
&= gt;   systemml-0.10.0-incubating-SNAPSHOT-standalone.tar.gz
> &n= bsp; systemml-0.10.0-incubating-SNAPSHOT-standalone.zip
>   syst= emml-0.10.0-incubating-SNAPSHOT.jar
>   systemml-0.10.0-incubati= ng-SNAPSHOT.tar.gz
>   systemml-0.10.0-incubating-SNAPSHOT.zip>
> It looks like JCuda is platform-specific, so you typically n= eed
different
> jars/dlls/sos/etc for each platform. If I'm under= standing things
correctly,
> if we generated Windows/Linux/LinuxP= owerPC/MacOS-specific SystemML
> artifacts for JCuda, we'd potentiall= y have an enormous number of
artifacts.
>
> Is this somethi= ng that could be potentially handled by specific profiles
in
> th= e pom so that a user might be able to do something like "mvn clean
= > package -P jcuda-windows" so that a user could be responsible for=
building
> the platform-specific SystemML jar for jcuda? Or is t= his something that
> could be handled differently, by putting the pla= tform-specific jcuda jar
on
> the classpath and any dlls or other= needed libraries on the path?
>
> Deron
>
>
>= ;
> On Tue, May 17, 2016 at 10:50 PM, Niketan Pansare <npansar@us.= ibm.com>
> wrote:
>
> > Hi Luciano,
> >> > Like all our backends, there is no change in the programming mod= el.
The
> > user submits a DML script and specifies whether sh= e wants to use an
> > accelerator. Assuming that we compile jcuda = jars into SystemML.jar,
the
> > user can use GPU backend using= following command:
> > spark-submit --master yarn-client ... -f M= yAlgo.dml -accelerator -exec
> > hybrid=5Fspark
> >
&g= t; > The user also needs to set LD=5FLIBRARY=5FPATH that points to JCuda= DLL or
so
> > files. Please see *https://issues.apache.org/ji= ra/browse/SPARK-1720*
> > <
https://issues.apache.org/jira/browse/SPA= RK-1720> ... For example:
the
>
> > user= can add following to spark-env.sh
> > export LD=5FLIBRARY=5FPATH= =3D<path to jcuda so>:$LD=5FLIBRARY=5FPATH
> >
> > = The first version of GPU backend will only accelerate CP. In this
case,=
> we
> > have four types of instructions:
> > 1. C= P
> > 2. GPU (requires GPU on the driver)
> > 3. SPARK> > 4. MR
> >
> > Note, the first version will req= uire the CUDA/JCuda dependency to be
> > installed on the driver o= nly.
> >
> > The next version will accelerate our distrib= uted instructions as well.
In
> > this case, we will have six = types of instructions:
> > 1. CP
> > 2. GPU
> > = 3. SPARK
> > 4. MR
> > 5. SPARK-GPU (requires GPU cluster= )
> > 6. MR-GPU (requires GPU cluster)
> >
> > T= hanks,
> >
> > Niketan Pansare
> > IBM Almaden R= esearch Center
> > E-mail: npansar At us.ibm.com
> >
&= gt;
http://researcher.watson.ibm.com/researcher/view.php= ?person=3Dus-npansar
>
> >
> > [image:= Inactive hide details for Luciano Resende ---05/17/2016
09:13:24
&g= t; > PM---Great to see detailed information on this topic Niketan,]Lucia= no
> > Resende ---05/17/2016 09:13:24 PM---Great to see detailed i= nformation
on
> > this topic Niketan, I guess I have missed wh= en you posted it in
> >
> > From: Luciano Resende <luc= kbr1975@gmail.com>
> > To: dev@systemml.incubator.apache.org> > Date: 05/17/2016 09:13 PM
> > Subject: Re: Discussion o= n GPU backend
> > ------------------------------
>
> &= gt;
> >
> >
> > Great to see detailed informatio= n on this topic Niketan, I guess I
have
> > missed when you po= sted it initially.
> >
> > Could you elaborate a little m= ore on what is the programming model for
> when
> > the user= wants to leverage GPU ? Also, today I can submit a job to
spark
>= ; > using --jars and it will handle copying the dependencies to the work= er
> > nodes. If my application wants to leverage GPU, what extras=
dependencies
> > will be required on the worker nodes, and ho= w they are going to be
> > installed/updated on the Spark cluster = ?
> >
> >
> >
> > On Tue, May 3, 2016 a= t 1:26 PM, Niketan Pansare <npansar@us.ibm.com>
> > wrote:> >
> > >
> > >
> > > Hi all,> > >
> > > I have updated the design document for o= ur GPU backend in the JIRA
> > >
>
https://issues.apache.org= /jira/browse/SYSTEMML-445. The implementation
>
> = > > details are based on the prototype I created and is available in = PR
> > >
>
https://github.com/apache/incubator-systemml/p= ull/131. Once we are done
>
> > > with the d= iscussion, I can clean up and separate out the GPU backend
> in a
= > > > separate PR for easier review :)
> > >
> &= gt; > Here are key design points:
> > > A GPU backend would = implement two abstract classes:
> > >    1.   GP= UContext
> > >    2.   GPUObject
> > &g= t;
> > >
> > >
> > > The GPUContext is = responsible for GPU memory management and gets
> > call-backs
&= gt; > > from SystemML's bufferpool on following methods:
> >= >    1.   void acquireRead(MatrixObject mo)
> >= >    2.   void acquireModify(MatrixObject mo)
> &g= t; >    3.   void release(MatrixObject mo, boolean isGPUC= opyModified)
> > >    4.   void exportData(Matri= xObject mo)
> > >    5.   void evict(MatrixObjec= t mo)
> > >
> > >
> > >
> > &g= t; A GPUObject (like RDDObject and BroadcastObject) is stored in
> &g= t; CacheableData
> > > object. It contains following methods th= at are called back from the
> > > corresponding GPUContext:
= > > >    1.   void allocateMemoryOnDevice()
>= > >    2.   void deallocateMemoryOnDevice()
> &= gt; >    3.   long getSizeOnDevice()
> > > &n= bsp;  4.   void copyFromHostToDevice()
> > >   &= nbsp;5.   void copyFromDeviceToHost()
> > >
> > &= gt;
> > >
> > > In the initial implementation, we w= ill add JCudaContext and
> JCudaPointer
> > > that will e= xtend the above abstract classes respectively. The
> > JCudaContex= t
> > > will be created by ExecutionContextFactory depending on= the
> > user-specified
> > > accelarator. Analgous to= MR/SPARK/CP, we will add a new ExecType:
GPU
> and
> > = > implement GPU instructions.
> > >
> > > The ab= ove design is general enough so that other people can
implement
>= > > custom accelerators (for example: OpenCL) and also follows the <= br>design
> > > principles of our CP bufferpool.
> > &= gt;
> > > Thanks,
> > >
> > > Niketan P= ansare
> > > IBM Almaden Research Center
> > > E-ma= il: npansar At us.ibm.com
> > >
>
http= ://researcher.watson.ibm.com/researcher/view.php?person=3Dus-npansar
>
> > >
> >
> >
> >
= > > --
> > Luciano Resende
> >
http://twitter.com/lresende1975
> >
http://l= resende.blogspot.com/
> >
> >
> >> >
>
>
>
>







=



--1__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839-- --0__=8FBBF52DDFC278398f9e8a93df938690918c8FBBF52DDFC27839--