systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SYSTEMML-1471) Support PreparedScript for MLContext
Date Fri, 07 Apr 2017 17:44:41 GMT

    [ https://issues.apache.org/jira/browse/SYSTEMML-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961181#comment-15961181
] 

Matthias Boehm commented on SYSTEMML-1471:
------------------------------------------

[~mwdusenb@us.ibm.com] I don't think this is feasible. JMLC is designed for single-threaded,
in-memory scoring inside MR or Spark jobs to allow data-parallel scoring. The implications
are (1) pure in-memory processing (w/o caching, or any other file system lookups), and (2)
no distributed datasets or operations because these are invalid in surrounding data-parallel
jobs. Down the road, we want to remove the entire Spark and MR dependency from this code path
to make it easily embeddable - so the API can't have any references to Spark.

[~deron] the plan is to essentially (1) remove all instances of our replicated compilation
chain, (2) create a configurable compilation chain, and (3) simply use this new construct
for all APIs. If you want to keep the current structure or certain classes, maybe you could
take over SYSTEMML-1325 and give it an initial design? The only reason I deferred this task
was the 0.14 release as we probably want to remove the old mlcontext first.

> Support PreparedScript for MLContext
> ------------------------------------
>
>                 Key: SYSTEMML-1471
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1471
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Niketan Pansare
>
> The intent of this JIRA is three-fold:
> 1. Allow MLContext to be used in prediction scenario.
> 2. Consolidate the code of JMLC and MLContext.
> 3. Explore what extensions are needed in SystemML to support Spark streaming.
> For prediction scenario, it is important to reduce the parsing/validation overhead as
much as possible and reusing the JMLC infrastructure might be a good step in that direction.
It is also important that MLContext continues to support dynamic recompilation and other optimization
as the input size could be small (similar to JMLC), but could also be large (if window size
is large, making MLContext ideal for this scenario). 
> {code}
> val streamingContext = new StreamingContext(sc, SLIDE_INTERVAL)
> val windowDStream  = .....window(WINDOW_LENGTH, SLIDE_INTERVAL)
> val preparedScript = ....prepareScript(....)
> windowDStream.foreachRDD(currentWindow => {
> if (currentWindow.count() > 0) {
>   ml.execute(preparedScript.in("X", currentWindow.toDF()))
>   ...
> }
> })
> {code}
> [~deron] [~mboehm7] [~reinwald] [~freiss] [~mwdusenb@us.ibm.com] [~nakul02] Is this something
that interest anyone of you ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message