reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Weimer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-936) Support scale-down in REEF
Date Thu, 05 May 2016 18:08:13 GMT

    [ https://issues.apache.org/jira/browse/REEF-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272745#comment-15272745
] 

Markus Weimer commented on REEF-936:
------------------------------------

As I was reviewing current pull requests for this, I had another look at {{MultiRuntimeConfigurationBuilder}}.
It now seems very counter-intuitive to me. Look at the code in {{HelloREEFMultiYarn.getHybridYarnSubmissionRuntimeConfiguration}}:

{code}
return new MultiRuntimeConfigurationBuilder()
            .setDefaultRuntime(org.apache.reef.runtime.yarn.driver.RuntimeIdentifier.RUNTIME_NAME)
            .setSubmissionRuntime(org.apache.reef.runtime.yarn.driver.RuntimeIdentifier.RUNTIME_NAME)
            .addRuntime(org.apache.reef.runtime.local.driver.RuntimeIdentifier.RUNTIME_NAME)
            .addRuntime(org.apache.reef.runtime.yarn.driver.RuntimeIdentifier.RUNTIME_NAME)
            .setMaxEvaluatorsNumberForLocalRuntime(1)
            .build();
{code}

At no point does the user actually configure the runtimes in question. For instance, what
if they want to submit to a YARN instance which needs a correct user name to work? There doesn't
seem to be a way to specify that. What if I want to support Mesos next? This tight coupling
of the multi-runtime with the (small) set of supported participants creates an explosion in
the complexity of all the code associated with this change. What prevents us from creating
an API that makes the client code look more like this:

{code}
return new MultiRuntimeConfigurationBuilder()
  .addSubmissionRuntime(YARN, YarnClientConfiguration.CONF.build())
  .addDefaultRuntime(YARN, YARNDriverConfiguration.CONF.build())
  .addRuntime(LOCAL, LocalDriverConfiguration.CONF.set(NUMBER_OF_EVALUATORS, 1).build())
  .build();
{code}

This would allow an arbitrary set of runtimes to take part in the multi-runtime. Also, it
would remove all special-casing in the multi-runtime for specific (combinations of) runtimes.
This should simplify the code in the multi-runtime tremendously.

On the flip side, runtimes now need to implement those APIs, namely {{ConfigurationBuilder}}s
for the Driver side and the Client side separately. That should be doable and has been introduced
via the {{Extensible*}} builders done as part of this work.

WDYT?

> Support scale-down in REEF
> --------------------------
>
>                 Key: REEF-936
>                 URL: https://issues.apache.org/jira/browse/REEF-936
>             Project: REEF
>          Issue Type: Improvement
>          Components: REEF
>            Reporter: Markus Weimer
>            Assignee: Boris Shulman
>
> The minimal useful REEF job right now consists of two containers: One for the Driver
and one for the Evaluator. This is a fitting design for applications that need to scale out.
> However, some apps might need to be elastic to workloads the size of a single Task, at
which point the overhead for the Driver becomes substantial.
> While there is no plan of action yet, let's use this JIRA as the link target for future
work towards this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message