Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DDB40200B8C for ; Mon, 12 Sep 2016 23:22:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DBEE7160AB8; Mon, 12 Sep 2016 21:22:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AC4E5160AB2 for ; Mon, 12 Sep 2016 23:22:06 +0200 (CEST) Received: (qmail 88792 invoked by uid 500); 12 Sep 2016 21:22:05 -0000 Mailing-List: contact dev-help@systemml.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.incubator.apache.org Delivered-To: mailing list dev@systemml.incubator.apache.org Received: (qmail 88777 invoked by uid 99); 12 Sep 2016 21:22:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2016 21:22:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CAC531A72E5 for ; Mon, 12 Sep 2016 21:22:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.299 X-Spam-Level: * X-Spam-Status: No, score=1.299 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, TVD_FW_GRAPHIC_NAME_MID=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Mcjgi5Mb6eHu for ; Mon, 12 Sep 2016 21:22:02 +0000 (UTC) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 396CE5F39C for ; Mon, 12 Sep 2016 21:22:02 +0000 (UTC) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8CLIXIg056534 for ; Mon, 12 Sep 2016 17:22:02 -0400 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0b-001b2d01.pphosted.com with ESMTP id 25dwxyu2v0-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 12 Sep 2016 17:22:01 -0400 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 12 Sep 2016 17:22:00 -0400 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 12 Sep 2016 17:21:58 -0400 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: mboehm@us.ibm.com Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id A84E7C90042 for ; Mon, 12 Sep 2016 17:21:45 -0400 (EDT) Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u8CLLxjc15729018 for ; Mon, 12 Sep 2016 21:21:59 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 42EE4AE034 for ; Mon, 12 Sep 2016 17:21:58 -0400 (EDT) Received: from d50lp31.co.us.ibm.com (unknown [9.17.249.32]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTPS id 0619DAE03C for ; Mon, 12 Sep 2016 17:21:57 -0400 (EDT) Received: from localhost by d50lp31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 12 Sep 2016 15:21:57 -0600 Received: from smtp.notes.na.collabserv.com (192.155.248.82) by d50lp31.co.us.ibm.com (192.168.2.141) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128/128) Mon, 12 Sep 2016 15:21:56 -0600 X-IBM-Helo: smtp.notes.na.collabserv.com X-IBM-MailFrom: mboehm@us.ibm.com Received: from localhost by smtp.notes.na.collabserv.com with smtp.notes.na.collabserv.com ESMTP for from ; Mon, 12 Sep 2016 21:21:54 -0000 Received: from us1a3-smtp04.a3.dal06.isc4sb.com (10.106.154.237) by smtp.notes.na.collabserv.com (10.106.227.105) with smtp.notes.na.collabserv.com ESMTP; Mon, 12 Sep 2016 21:21:52 -0000 X-IBM-Helo: us1a3-smtp04.a3.dal06.isc4sb.com X-IBM-MailFrom: mboehm@us.ibm.com Received: from us1a3-mail149.a3.dal06.isc4sb.com ([10.146.38.84]) by us1a3-smtp04.a3.dal06.isc4sb.com with ESMTP id 2016091221215215-438764 ; Mon, 12 Sep 2016 21:21:52 +0000 MIME-Version: 1.0 In-Reply-To: Subject: Re: Simplification of MLContext and related APIs To: dev@systemml.incubator.apache.org From: "Matthias Boehm" Date: Mon, 12 Sep 2016 23:21:48 +0200 References: <1742075068.3375305.1473641550865@mail.yahoo.com> X-KeepSent: FD2E9FA1:89D2A705-0025802C:00743DE7; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1FP2 SHF37 August 25, 2014 X-LLNOutbound: False X-Disclaimed: 60787 X-TNEFEvaluated: 1 Content-type: multipart/related; Boundary="0__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77" x-cbid: 16091221-0044-0000-0000-0000012B4CF9 X-IBM-ISS-SpamDetectors: Score=0.373977; BY=0.085038; FL=0; FP=0; FZ=0; HX=0; KW=0; PH=0; SC=0.373977; ST=0; TS=0; UL=0; ISC= X-IBM-ISS-DetailInfo: BY=3.00005751; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000185; SDB=6.00756975; UDB=6.00358855; UTC=2016-09-12 21:21:54 x-cbparentid: 16091221-5102-0000-0000-0000017F7109 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005751; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000185; SDB=6.00756975; UDB=6.00358855; IPR=6.00530289; BA=6.00004710; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012653; XFM=3.00000011; UTC=2016-09-12 21:21:59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-12_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609120320 archived-at: Mon, 12 Sep 2016 21:22:08 -0000 --0__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77 Content-type: multipart/alternative; Boundary="1__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77" --1__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77 Content-Transfer-Encoding: quoted-printable Content-type: text/plain; charset=US-ASCII great - then we're all on the same page. Let me just clarify two aspects: First, I think we do need abstract frame/matrix data types at API level, but just one type that is used consistently across MLContext and all DSLs we're about to add. Second, relying on a common compilation chain does not directly affect users but ensures consistent behavior across all APIs. So the bottom line is, we're going to remove MatrixObject/FrameObject and other internal structures from API level, remove the BinaryBlockMatrix/BinaryBlockFrame types, and try to consolidate the various Matrix/Frame objects as well as replicated compilation chains. Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 09/12/2016 01:56 PM Subject: Re: Simplification of MLContext and related APIs Feel free to not expose MatrixObject and FrameObject. I am fine with that. The only reason MatrixObject and FrameObject are exposed is that I felt if the new MLContext API did not expose them, there would be complaints from existing committers that these objects were not available. I can't see anyone outside of SystemML core developers caring about MatrixObject and FrameObject or even for that matter ever even using these classes. Users want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically anything but a MatrixObject or FrameObject. If you remove entities such as Matrix and Frame, you have the older MLContext API. Perhaps users who don't wish to use objects such as Matrix and Frame can use the older API since these suggestions are already built into the old API? Deron On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry wrote: > I also agree that internal data structures shouldn't be exposed to a user. > However, I think we definitely need to keep the `Matrix` and `Frame` types > in the API, in agreement with Arvind. The main purpose of SystemML for a > user is to allow for machine learning algorithms involving matrices to be > run on a given system (laptop, Spark cluster, etc.). Anything involving a > compilation chain directly is noise for our ML users. Thus it's quite > useful for SystemML to expose a `Matrix` type with a limited API as is > currently done in MLContext. This allows a user to interact with SystemML > via these `Matrix` objects which abstractly represent the core data > structure of a SystemML script. Furthermore, these Matrix objects can be > used as subsequent input to an additional script, or can be converted to a > DataFrame once the user is ready to continue interacting with Spark. As > Arvind mentioned, this just allows the DML `Matrix` type to be effectively > exposed at the API level as well. Additionally, we plan to unify this > `Matrix` type with the lazy matrix types we are creating in the Python and > Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in > DML. The similar argument exists for `Frame` as well. > > I think that limiting the exposure of internal structures to users could be > useful, but removing `Matrix` & `Frame` and instead having a user deal > directly with compilation chains would be a step backwards. > > - Mike > > -- > > Michael W. Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > On Sun, Sep 11, 2016 at 5:52 PM, Acs S wrote: > > > Yes, I agree that we should NOT expose any internal objects at API > > level.Objects like FrameObject, MatrixObject should not be exposed as > those > > are internal objects. > > Rule of thumb should be if object (Frame, Object or Scalar) is exposed at > > DML level it should be exposed at MlContext level.If there is need to > > add anything extra object besides being exposed in DML it should be > > justifiable with rationale. > > I have introduced FrameObject as oversight. It should have been private > > method instead of public method. I can fix it soon. But there are more > > changes you have proposed I will let Deron to respond. > > Thanks for catching these issues. > > -Arvind > > > > From: Matthias Boehm > > To: dev > > Sent: Sunday, September 11, 2016 9:43 AM > > Subject: Simplification of MLContext and related APIs > > > > > > > > It's great to see the ongoing progress on MLContext and related APIs. > > However, one aspect that really concerns me is the creation of many > > redundant data types and exposition of various internal data structures. > > For example, exposing MatrixObject and FrameObject at API level is > > dangerous because it makes external programs data-dependent on internal > > structures that might be subject to change (no API stability) and users > > might not be aware of the implications their interactions have on the > > buffer pool etc. Furthermore, having such a plethora of entry points > makes > > it very hard to ensure consistency of the compilation chain with regard > to > > configuration handling, environment setup and advanced compilation > > techniques. > > > > I would recommend to create a holistic design across the various APIs > that > > aims to (1) reduce the number of exposed data types (for instance, I > would > > like to remove MatrixObject/FrameObject from the external interface, as > > well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and > > related meta data objects), and (2) create a configurable compilation > chain > > that is invoked from all external APIs. I understand that these data > types > > were introduced to simplify, for example, imports in user programs but > I'm > > sure we find an alternative realization with less redundancy. What do you > > think? > > > > Regards, > > Matthias > > > > > > > --1__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77 Content-Transfer-Encoding: quoted-printable Content-type: text/html; charset=US-ASCII Content-Disposition: inline

great - then we're all on the same page. Let me just clarify= two aspects: First, I think we do need abstract frame/matrix data types at= API level, but just one type that is used consistently across MLContext an= d all DSLs we're about to add. Second, relying on a common compilation chai= n does not directly affect users but ensures consistent behavior across all= APIs.

So the bottom line is, we're going to remove MatrixObject/Fra= meObject and other internal structures from API level, remove the BinaryBlo= ckMatrix/BinaryBlockFrame types, and try to consolidate the various Matrix/= Frame objects as well as replicated compilation chains.

Regards,
= Matthias

3D"InactiveDeron Eriksson -= --09/12/2016 01:56:55 PM---Feel free to not expose MatrixObject and FrameOb= ject. I am fine with that. The only reason MatrixObj

From: Deron Eriks= son <deroneriksson@gmail.com>
To: dev@systemml.incubator.apache.org<= /font>
Date: 09/12/2016 01:56 PM
Sub= ject: Re: Simplification of MLContext and re= lated APIs





Feel free to not expose MatrixObje= ct and FrameObject. I am fine with that.
The only reason MatrixObject an= d FrameObject are exposed is that I felt if
the new MLContext API did no= t expose them, there would be complaints from
existing committers that t= hese objects were not available. I can't see
anyone outside of SystemML = core developers caring about MatrixObject and
FrameObject or even for th= at matter ever even using these classes. Users
want DataFrames, DataSets= , RDDs, 2D arrays, CSV files, or practically
anything but a MatrixObject= or FrameObject.

If you remove entities such as Matrix and Frame, yo= u have the older
MLContext API. Perhaps users who don't wish to use obje= cts such as Matrix
and Frame can use the older API since these suggestio= ns are already built
into the old API?

Deron


On Mon, S= ep 12, 2016 at 1:22 PM, Mike Dusenberry <dusenberrymw@gmail.com>
w= rote:

> I also agree that internal data structures shouldn't be e= xposed to a user.
> However, I think we definitely need to keep the `= Matrix` and `Frame` types
> in the API, in agreement with Arvind. &nb= sp;The main purpose of SystemML for a
> user is to allow for machine = learning algorithms involving matrices to be
> run on a given system = (laptop, Spark cluster, etc.).  Anything involving a
> compilati= on chain directly is noise for our ML users.  Thus it's quite
> = useful for SystemML to expose a `Matrix` type with a limited API as is
&= gt; currently done in MLContext.  This allows a user to interact with = SystemML
> via these `Matrix` objects which abstractly represent the = core data
> structure of a SystemML script.  Furthermore, these = Matrix objects can be
> used as subsequent input to an additional scr= ipt, or can be converted to a
> DataFrame once the user is ready to c= ontinue interacting with Spark.  As
> Arvind mentioned, this jus= t allows the DML `Matrix` type to be effectively
> exposed at the API= level as well.  Additionally, we plan to unify this
> `Matrix` = type with the lazy matrix types we are creating in the Python and
> S= cala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
&g= t; DML.  The similar argument exists for `Frame` as well.
>
&= gt; I think that limiting the exposure of internal structures to users coul= d be
> useful, but removing `Matrix` & `Frame` and instead having= a user deal
> directly with compilation chains would be a step backw= ards.
>
> - Mike
>
> --
>
> Michael W. = Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linked= in.com/in/mikedusenberry
>
> On Sun, Sep 11, 2016 at 5:52 PM, A= cs S <acs=5Fs@yahoo.com.invalid> wrote:
>
> > Yes, I a= gree that we should NOT expose any internal objects at API
> > lev= el.Objects like FrameObject, MatrixObject should not be exposed as
> = those
> > are internal objects.
> > Rule of thumb should = be if object (Frame, Object or Scalar) is exposed at
> > DML level= it should be exposed at MlContext level.If there is need to
> > a= dd anything extra object besides being exposed in DML it should be
> = > justifiable with rationale.
> > I have introduced FrameObject= as oversight. It should have been private
> > method instead of p= ublic method. I can fix it soon. But there are more
> > changes yo= u have proposed I will let Deron to respond.
> > Thanks for catchi= ng these issues.
> > -Arvind
> >
> >   &nbs= p;   From: Matthias Boehm <mboehm@us.ibm.com>
> >  = ;To: dev <dev@systemml.incubator.apache.org>
> >  Sent:= Sunday, September 11, 2016 9:43 AM
> >  Subject: Simplificat= ion of MLContext and related APIs
> >
> >
> >> > It's great to see the ongoing progress on MLContext and related = APIs.
> > However, one aspect that really concerns me is the creat= ion of many
> > redundant data types and exposition of various int= ernal data structures.
> > For example, exposing MatrixObject and = FrameObject at API level is
> > dangerous because it makes externa= l programs data-dependent on internal
> > structures that might be= subject to change (no API stability) and users
> > might not be a= ware of the implications their interactions have on the
> > buffer= pool etc. Furthermore, having such a plethora of entry points
> make= s
> > it very hard to ensure consistency of the compilation chain = with regard
> to
> > configuration handling, environment set= up and advanced compilation
> > techniques.
> >
> &= gt; I would recommend to create a holistic design across the various APIs> that
> > aims to (1) reduce the number of exposed data type= s (for instance, I
> would
> > like to remove MatrixObject/F= rameObject from the external interface, as
> > well as remove Bina= ryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and
> > related me= ta data objects), and (2) create a configurable compilation
> chain> > that is invoked from all external APIs. I understand that these= data
> types
> > were introduced to simplify, for example, = imports in user programs but
> I'm
> > sure we find an alter= native realization with less redundancy. What do you
> > think?> >
> > Regards,
> > Matthias
> >
>= >
> >
>


--1__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77-- --0__=8FBB0ABFDFE7BB778f9e8a93df938690918c8FBB0ABFDFE7BB77--