systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nakul Jindal <>
Subject Re: DML in Zeppelin
Date Tue, 12 Apr 2016 09:10:53 GMT
Hi All,

Niketan, this feedback in much appreciated and I will continue to work on
this. In the meantime, some of the other (offline) feedback I got for this
included making DML variables accessible across DML cells. Towards that
end, I've made some improvements to the Zeppelin-DML integration. There is
also a convenient (albeit large ~2GB ) docker image to test this out with.

All the information is on the JIRA :
It has screenshots, docker instructions and steps to recreate the dev
environment to play with.

These are the features (thus far):

Launch a standalone DML cell which runs the DML interpreter locally (using
- This has rudimentary features and will be developed if there is demand

Launch a DML cell which runs on Spark (using %spark.dml)
- Transfer data between Spark, PySpark, etc and DML Cells (as Dataframes)
      -- Read data in a Spark cell (as a DataFrame) and use it in a DML cell
      -- Write a DML matrix in a DML cell and read it as a DataFrame in a
Spark Cell
      -- This is done using ZeppelinContext (
- Transfer data between DML cells - scalar types (booleans, strings,
floats, integers) and matrices
      -- Any variable defined in a cell can be used (read from/written to)
in subsequent cells.
      -- This is very similar to how spark cells operate.

Any feedback is greatly appreciated.

Nakul Jindal

On Tue, Mar 8, 2016 at 10:30 AM, Niketan Pansare <> wrote:

> Hi Nakul,
> This is good work !
> My 2 cents, we should add missing features (such as command-line
> arguments), document the API for this POC, come up with examples for
> existing algorithms with open-source datasets and put them in
> This way, people are encouraged to try out (and may be even modify
> on-the-fly the) existing DML algorithms with specific datasets. Borrowing
> an example from
> >>> from sklearn import datasets
> >>> iris = datasets.load_iris()
> >>> digits = datasets.load_digits()
> *>>> **from* *sklearn* *import* svm
> *>>> *clf = svm.SVC(gamma=0.001, C=100.)
> *>>> *[:-1],[:-1])
> *>>> *clf.predict([-1:])
> We can then put a link to the given example in
> Thanks,
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At
> [image: Inactive hide details for Nakul Jindal ---03/06/2016 07:22:10
> PM---Hi, I've put together a proof of concept for having DML be a]Nakul
> Jindal ---03/06/2016 07:22:10 PM---Hi, I've put together a proof of concept
> for having DML be a first class
> From: Nakul Jindal <>
> To:
> Date: 03/06/2016 07:22 PM
> Subject: DML in Zeppelin
> ------------------------------
> Hi,
> I've put together a proof of concept for having DML be a first class
> citizen in Apache Zeppelin.
> Brief intro to Zeppelin -
> Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive
> and other projects. It can be thought of as a REPL in a browser.
> Small units of code are put into "cell"s. These individual "cells" can then
> be run interactively. Of course there is support for queue-ing up and
> running cells in parallel.
> Cells are contained in notebooks. Notebooks can be exported and are
> persistent between sessions.
> One can type code in (Scala) Spark in cell 1 and save a data frame object.
> He can then type code in PySpark in cell 2 and access the previously saved
> data frame.
> This is done by the Zeppelin runtime system by injecting a special variable
> called "z" into the Spark and PySpark environments in Zeppelin. This "z" is
> an object of type ZeppelinContext and makes available a "get" and a "put"
> method.
> DML in Spark mode can now access this feature as well.
> In this POC, DML can operate in 2 modes - standalone and spark.
> Screenshots of it working:
> GIF of the screenshots:
> Instructions:
> Nakul Jindal

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message