systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare" <npan...@us.ibm.com>
Subject Re: DML in Zeppelin
Date Tue, 08 Mar 2016 18:30:13 GMT

Hi Nakul,

This is good work !

My 2 cents, we should add missing features (such as command-line
arguments), document the API for this POC, come up with examples for
existing algorithms with open-source datasets and put them in
https://github.com/apache/incubator-systemml/tree/master/samples/zeppelin-notebooks

This way, people are encouraged to try out (and may be even modify
on-the-fly the) existing DML algorithms with specific datasets. Borrowing
an example from http://scikit-learn.org/stable/tutorial/basic/tutorial.html
:
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
>>> from sklearn import svm
>>> clf = svm.SVC(gamma=0.001, C=100.)
>>> clf.fit(digits.data[:-1], digits.target[:-1])
>>> clf.predict(digits.data[-1:])

We can then put a link to the given example in
http://apache.github.io/incubator-systemml/algorithms-classification.html#support-vector-machines

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Nakul Jindal <nakul02@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	03/06/2016 07:22 PM
Subject:	DML in Zeppelin



Hi,

I've put together a proof of concept for having DML be a first class
citizen in Apache Zeppelin.

Brief intro to Zeppelin -
Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive
and other projects. It can be thought of as a REPL in a browser.
Small units of code are put into "cell"s. These individual "cells" can then
be run interactively. Of course there is support for queue-ing up and
running cells in parallel.
Cells are contained in notebooks. Notebooks can be exported and are
persistent between sessions.

One can type code in (Scala) Spark in cell 1 and save a data frame object.
He can then type code in PySpark in cell 2 and access the previously saved
data frame.
This is done by the Zeppelin runtime system by injecting a special variable
called "z" into the Spark and PySpark environments in Zeppelin. This "z" is
an object of type ZeppelinContext and makes available a "get" and a "put"
method.
DML in Spark mode can now access this feature as well.

In this POC, DML can operate in 2 modes - standalone and spark.

Screenshots of it working:
http://imgur.com/a/m7ASx

GIF of the screenshots:
http://i.imgur.com/NttMuKC.gifv

Instructions:
https://gist.github.com/anonymous/6ab8c569b2360232e252

JIRA:
https://issues.apache.org/jira/browse/SYSTEMML-542


Nakul Jindal



Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message