systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deron Eriksson <deroneriks...@gmail.com>
Subject Re: Simplification of MLContext and related APIs
Date Mon, 12 Sep 2016 21:52:52 GMT
Hi Matthias,

Great! I would be very happy to see BinaryBlockMatrix incorporated into
Matrix and BinaryBlockFrame incorporated into Frame since this would be a
welcome simplification of the API. Reducing the API to the essential
concepts is a big win for our users. This would have already happened if I
had the depth of knowledge of SystemML required to make this happen in a
reasonable timeframe.

I would definitely approve of further extracting Matrix and Frame to a
common type if this can be done in a way that feels natural for the end
user. At this point I can't really explain it further, but if I expect to
get back a matrix of numbers, I want this to feel natural, and if I get
back a frame consisting of columns of different data types, I want this to
feel natural too. I want our end users to put in data and get out results
in a minumum number of steps that feel intuitive. By the way, I think we
are getting very close, which is a great sign!

Deron


On Mon, Sep 12, 2016 at 2:21 PM, Matthias Boehm <mboehm@us.ibm.com> wrote:

> great - then we're all on the same page. Let me just clarify two aspects:
> First, I think we do need abstract frame/matrix data types at API level,
> but just one type that is used consistently across MLContext and all DSLs
> we're about to add. Second, relying on a common compilation chain does not
> directly affect users but ensures consistent behavior across all APIs.
>
> So the bottom line is, we're going to remove MatrixObject/FrameObject and
> other internal structures from API level, remove the BinaryBlockMatrix/BinaryBlockFrame
> types, and try to consolidate the various Matrix/Frame objects as well as
> replicated compilation chains.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---09/12/2016 01:56:55
> PM---Feel free to not expose MatrixObject and FrameObject. I am]Deron
> Eriksson ---09/12/2016 01:56:55 PM---Feel free to not expose MatrixObject
> and FrameObject. I am fine with that. The only reason MatrixObj
>
> From: Deron Eriksson <deroneriksson@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 09/12/2016 01:56 PM
> Subject: Re: Simplification of MLContext and related APIs
> ------------------------------
>
>
>
> Feel free to not expose MatrixObject and FrameObject. I am fine with that.
> The only reason MatrixObject and FrameObject are exposed is that I felt if
> the new MLContext API did not expose them, there would be complaints from
> existing committers that these objects were not available. I can't see
> anyone outside of SystemML core developers caring about MatrixObject and
> FrameObject or even for that matter ever even using these classes. Users
> want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically
> anything but a MatrixObject or FrameObject.
>
> If you remove entities such as Matrix and Frame, you have the older
> MLContext API. Perhaps users who don't wish to use objects such as Matrix
> and Frame can use the older API since these suggestions are already built
> into the old API?
>
> Deron
>
>
> On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry <dusenberrymw@gmail.com>
> wrote:
>
> > I also agree that internal data structures shouldn't be exposed to a
> user.
> > However, I think we definitely need to keep the `Matrix` and `Frame`
> types
> > in the API, in agreement with Arvind.  The main purpose of SystemML for a
> > user is to allow for machine learning algorithms involving matrices to be
> > run on a given system (laptop, Spark cluster, etc.).  Anything involving
> a
> > compilation chain directly is noise for our ML users.  Thus it's quite
> > useful for SystemML to expose a `Matrix` type with a limited API as is
> > currently done in MLContext.  This allows a user to interact with
> SystemML
> > via these `Matrix` objects which abstractly represent the core data
> > structure of a SystemML script.  Furthermore, these Matrix objects can be
> > used as subsequent input to an additional script, or can be converted to
> a
> > DataFrame once the user is ready to continue interacting with Spark.  As
> > Arvind mentioned, this just allows the DML `Matrix` type to be
> effectively
> > exposed at the API level as well.  Additionally, we plan to unify this
> > `Matrix` type with the lazy matrix types we are creating in the Python
> and
> > Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
> > DML.  The similar argument exists for `Frame` as well.
> >
> > I think that limiting the exposure of internal structures to users could
> be
> > useful, but removing `Matrix` & `Frame` and instead having a user deal
> > directly with compilation chains would be a step backwards.
> >
> > - Mike
> >
> > --
> >
> > Michael W. Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > On Sun, Sep 11, 2016 at 5:52 PM, Acs S <acs_s@yahoo.com.invalid> wrote:
> >
> > > Yes, I agree that we should NOT expose any internal objects at API
> > > level.Objects like FrameObject, MatrixObject should not be exposed as
> > those
> > > are internal objects.
> > > Rule of thumb should be if object (Frame, Object or Scalar) is exposed
> at
> > > DML level it should be exposed at MlContext level.If there is need to
> > > add anything extra object besides being exposed in DML it should be
> > > justifiable with rationale.
> > > I have introduced FrameObject as oversight. It should have been private
> > > method instead of public method. I can fix it soon. But there are more
> > > changes you have proposed I will let Deron to respond.
> > > Thanks for catching these issues.
> > > -Arvind
> > >
> > >       From: Matthias Boehm <mboehm@us.ibm.com>
> > >  To: dev <dev@systemml.incubator.apache.org>
> > >  Sent: Sunday, September 11, 2016 9:43 AM
> > >  Subject: Simplification of MLContext and related APIs
> > >
> > >
> > >
> > > It's great to see the ongoing progress on MLContext and related APIs.
> > > However, one aspect that really concerns me is the creation of many
> > > redundant data types and exposition of various internal data
> structures.
> > > For example, exposing MatrixObject and FrameObject at API level is
> > > dangerous because it makes external programs data-dependent on internal
> > > structures that might be subject to change (no API stability) and users
> > > might not be aware of the implications their interactions have on the
> > > buffer pool etc. Furthermore, having such a plethora of entry points
> > makes
> > > it very hard to ensure consistency of the compilation chain with regard
> > to
> > > configuration handling, environment setup and advanced compilation
> > > techniques.
> > >
> > > I would recommend to create a holistic design across the various APIs
> > that
> > > aims to (1) reduce the number of exposed data types (for instance, I
> > would
> > > like to remove MatrixObject/FrameObject from the external interface, as
> > > well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and
> > > related meta data objects), and (2) create a configurable compilation
> > chain
> > > that is invoked from all external APIs. I understand that these data
> > types
> > > were introduced to simplify, for example, imports in user programs but
> > I'm
> > > sure we find an alternative realization with less redundancy. What do
> you
> > > think?
> > >
> > > Regards,
> > > Matthias
> > >
> > >
> > >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message