systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Proof of Concept: Embedded Scala DSL
Date Tue, 27 Sep 2016 01:49:33 GMT
Hi Matthias,

thanks for taking a look at the document!
Let me try to answer your questions with some ideas - part of this POC 
and my current work is to find out what the best answers are!

1) I see basically two usecases for this DSL:
     - users write functions/algorithms much like prepared statements in 
SQL (defining functions `def fun(a: T, b: U) = parallelize { ... }` and 
executing them later)
     - users interactively submit snippets to SystemML (using `val A = 
parallelize { C %*% D } execute()` and directly executing)
In general, we should probably offer a write() primitive like in DML 
that persists the data on the filesystem. In the second case it's not 
quite clear to me what would be the best option right now. Intuitively I 
would want the result to be of the same type that my initial DSL 
expression was. If I multiply two matrices for example, I would want a 
Matrix (DSL Type) as a result. Ideally, I would not have to care about 
what underlying representation the actual matrix has and could just use 
the result in my next statement/function until I would want to pass the 
result somewhere else (persist it, transform it into a spark dataframe 
etc.). Given that right now the Algorithm.execute() would take the 
generated DML string and execute it using the MLContext, we would be 
free to return anything that the context can return - or wrap it in the 
DSL Matrix type. I am happy to discuss what would be best here!

For reusing the MLContext, I suggest using a global context that is held 
via a lazy variable in the api package object that is imoprted when 
using the DSL. The run method would get an implicit argument of type 
MLContext and the user would not have to take care of passing it. The 
laziness will help reusing it.

2) I think it should be possible to formulate semantically equivalent 
operations using breeze - the question is if the maintenance and 
implementation of two operational APIs makes sense and is feasible. The 
breeze rapid prototyping would be very nice IMO but probably shouldn't 
become a major source of work. As for the DNN operations - we could 
probably find a way of wrapping those, too - but I don't really think it 
makes sense and we might think about how we want to offer DML libraries 
in our DSLs in general. Apart from that, it seems like it is possible to 
call java functions directly from DML - this might be an interesting 
aspect to keep in mind for UDFs.

3) A frame datatype should definitely be part of the DSL and would 
probably work very similar to the Matrix abstraction. Right now I am 
working with matrices to figure out how a good way to use the DSL would 
look like. Apart from the general goal and idea of an embedded DSL, this 
includes figuring out what is possible in DML (and SystemML in general). 
The goal should be a DSL that allows for full support of all DML 
features (possibly even more).

I hope this clarifies some of your questions and I will send updates on 
the progress and update the document as I go.


Am 24.09.2016 10:11 schrieb Matthias Boehm:
> thanks for sharing the summary - this is very nice. While looking over
> the example, I had the following questions:
> 1) Output handling: It would be great to see an example how the
> results of Algorithm.execute() are consumed. Do you intend to hand out
> our binary matrix representation or MLContext's Matrix from which the
> user then requests specific output formats? Also if there are multiple
> Algorithm instances, how is the MLContext (with its internal state of
> lazily evaluated intermediates) reused?
> 2) Scala-breeze prototyping: How do you intend to support operations
> that are not supported in breeze? Examples are removeEmpty, table,
> aggregate, rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN
> operations?
> 3) Frame data type and operations: Do you also intend to add a frame
> type and its operations? I think for this initial prototype it is not
> necessarily required but please make the scope explicit.
> Regards,
> Matthias
> fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
> (SYSTEMML-451) I have started to implement a prototype/proof of co
> From:
> To:
> Date: 09/23/2016 04:36 PM
> Subject: Proof of Concept: Embedded Scala DSL
> -------------------------
> As discussed in the related Jira (SYSTEMML-451) I have started to
> implement a prototype/proof of concept for an embedded DSL in Scala.
> I have summarized the current approach in a short document that you
> can
> find on github together with the code:
> [1]
> Please note that current development happens in the Emma project but
> will move to an independent module in the SystemML project once the
> necessary additions to Emma are merged. By having the DSL in a
> separate
> module, we can include Scala and Emma dependencies only for the users
> that actually want to use the Scala DSL.
> The current code serves as a proof of concept to discuss further
> development with the SystemML community. I especially welcome input
> from
> SystemML Scala users on the usability of the API design.
> Next steps will include the translation from Scala code to DML with
> support of all features currently supported in DML, including control
> flow structures.
> Also, a coherent way of executing the generated scripts from Scala and
> the interaction with outside data formats (such as Spark Dataframes)
> will be integrated.
> I am happy to answer your questions and discuss the described approach
> here!
> Felix
> Links:
> ------
> [1] 

View raw message