reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Weimer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1791) Implement reef-runtime-spark
Date Mon, 15 May 2017 22:41:04 GMT

    [ https://issues.apache.org/jira/browse/REEF-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011451#comment-16011451
] 

Markus Weimer commented on REEF-1791:
-------------------------------------

I think it might help to enumerate what a "REEF runtime" actually is and then discuss which
parts of it we want on Spark. A REEF runtime consists of two distinct, potentially even separable
pieces:

*REEF Client:* An implementation of the interfaces necessary to submit a REEF Driver to a
resource manager for execution. In the case of YARN, this would mean the submission of an
Application Master, for example.

*REEF Driver:* On the Driver side, a runtime consists of the implementations of all the interfaces
necessary to process Evaluator requests, generate {{AllocatedEvaluator}} events and launch
the actual Evaluators. In the YARN example, much of this boils down to 1:1 mappings between
the REEF an YARN concepts. Another example: In the case of a local runtime, this part is a
bit more involved as it has to actually spawn the processes.


The *Spark Runtime* launches a REEF Job from an existing Spark job. Hence, we don't need a
client as much as [~motus]'s work on running the REEF Driver in the same JVM as the Spark
Driver. There is no "submission" of a job to a "cluster". Hence, this is more or less already
solved.

Now, for the Driver APIs, I think we can indeed rely on  Spark constructs as [~minterlandi]
suggested. The typical case would be to ask for one Evaluator (represented as an {{ActiveContext}})
per Spark Executor or RDD partition. Different from all other REEF runtimes, those Evaluators
would "just show up" without being asked for.

Which leaves the question of how to ask for _additional_ Evaluators on Spark. For that, I
see two options: (1) Ask YARN or (2) ask Spark. Not sure how to do the second one of these,
though.

> Implement reef-runtime-spark
> ----------------------------
>
>                 Key: REEF-1791
>                 URL: https://issues.apache.org/jira/browse/REEF-1791
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF
>            Reporter: Sergiy Matusevych
>            Assignee: Saikat Kanjilal
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We need to run REEF Tasks on Spark Executors. Ideally, that should require only a few
lines of changes in the REEF application configuration. All Spark-related logic must be encapsulated
in the {{reef-runtime-spark}} module, similar to the existing e.g. {{reef-runtime-yarn}} or
{{reef-runtime-local}}. As a first step, we can have a Java-only solution, but later we'll
need to run .NET Tasks on Executors as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message