systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niketan Pansare (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-1471) Support PreparedScript for MLContext
Date Fri, 07 Apr 2017 18:50:42 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961277#comment-15961277
] 

Niketan Pansare commented on SYSTEMML-1471:
-------------------------------------------

I think if certain settings are common for popular ML tasks, it is OK to keep a separate API
for that. For example: JMLC for in-memory scoring and MLContext for Spark and Python setting.
But, I would really prefer that all APIs has same user feel. For example: JMLC uses setMatrix,
whereas Scala MLContext has in and Python MLContext has input, which is an headache.

We need to separate user-facing API classes and internal classes for sake of the discussion.
Here is an initial proposal for the user-facing classes:
- One context for each API (MLContext or JMLC) --> used for initialization, optional settings
(setStatistics, setExplain, ...), and execute(script).
- One script representation (Script, PreparedScript or JMLCPreparedScript) --> used for
setting input and output variables as well as command-line parameters.
- One result representation (MLResults or JMLCResults) ---> returns output variable in
user-specified format (eg: DataFrame, RDD, double [][], ...)

If absolutely required, we can add ScriptExecutor to user-facing classes.

{code}
val ctx = new MLContext(sc); // or new JMLC(will not have sc) or MLContext(sc)
val script = new Script(...) // or PreparedScript(..)  or JMLCPreparedScript(...) ... this
way PreparedScript is subclass of Script
while (....) {
	val results = ctx.execute(script); // execute the dml program
}
{code}

+1 for removing replicated code. 

> Support PreparedScript for MLContext
> ------------------------------------
>
>                 Key: SYSTEMML-1471
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1471
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Niketan Pansare
>
> The intent of this JIRA is three-fold:
> 1. Allow MLContext to be used in prediction scenario.
> 2. Consolidate the code of JMLC and MLContext.
> 3. Explore what extensions are needed in SystemML to support Spark streaming.
> For prediction scenario, it is important to reduce the parsing/validation overhead as
much as possible and reusing the JMLC infrastructure might be a good step in that direction.
It is also important that MLContext continues to support dynamic recompilation and other optimization
as the input size could be small (similar to JMLC), but could also be large (if window size
is large, making MLContext ideal for this scenario). 
> {code}
> val streamingContext = new StreamingContext(sc, SLIDE_INTERVAL)
> val windowDStream  = .....window(WINDOW_LENGTH, SLIDE_INTERVAL)
> val preparedScript = ....prepareScript(....)
> windowDStream.foreachRDD(currentWindow => {
> if (currentWindow.count() > 0) {
>   ml.execute(preparedScript.in("X", currentWindow.toDF()))
>   ...
> }
> })
> {code}
> [~deron] [~mboehm7] [~reinwald] [~freiss] [~mwdusenb@us.ibm.com] [~nakul02] Is this something
that interest anyone of you ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message