spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
Date Tue, 03 Feb 2015 18:22:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303702#comment-14303702
] 

Marcelo Vanzin commented on SPARK-5388:
---------------------------------------

HI [~pwendell],

Let me try to write a point-by-point feedback for the current spec.

h4. Public protocol or not?

If this is not supposed to be public (i.e., we don't expect someone to try to directly try
to talk to the Spark master, it's always going to happen through the Spark libraries), then
the underlying protocol is less important, since we only care about different versions being
compatible in some way.

Assuming a non-public protocol, my question would be: why implement your own RPC framework?
Why not reuse something that's already there? For example, Avro has a stable serialization
infrastructure that defines semantics for versioned data, and works well on top of HTTP. If
handles serialization and dispatching - which would remove a lot of code from the current
patch, and probably has other features that the current, "cluster-mode" only protocol doesn't
need but other future uses might.

h4. Non-submission uses

Similarly, in the non-public protocol scenario, a proper REST-based API would look like overkill.
But a proper REST infrastructure provides interesting room for growth of the master's public-facing
API. For example, you could easily expose an endpoint for listing the current applications
being tracked by the master, or an endpoint to kill an application. The former could benefit,
also, the history server, which could expose the same API to list the applications it has
found.

h4. Evolvability and Versioning

The current spec does not specify the behavior of the cluster nor the client with regards
to different versions of the protocol. It has a table that basically says "future versions
need to be able to submit standalone cluster applications to a 1.3 master", but it doesn't
explain what that means or how that happens.

Does it mean that, after 1.3, you can't ever change any of the messages used to launch a standalone
cluster app, nor can you add new messages? Or, if that's allowed, what happens on the server
side if it sees a field it doesn't understand? Does it ignore it, which could potentially
break the application being submitted? Does it throw an error, in which case the client should
make sure to submit an older version of the data structures if that's compatible with the
app being submitted? If the latter, how does it know which version to use?

As an example of how you could do this "negotiation": the client checks what features the
app being submitted needs, and chooses the oldest supported api version based on that. It
then can submit the request to, e.g., "/v2" and, if submitting to a 1.3 cluster, it will fail,
because it doesn't support the features needed by that app.

Also, thinking about the framework, what if later you need different features than the ones
provided now? What if you need to use query params, path params, or non-json request bodies
(e.g. for uploading files)? Are you gonna extend the current framework to the point where
it starts looking like other existing ones?

Of, if HTTP is being used mostly as a dumb pipe, what are the semantics for when something
goes wrong? Should clients only bother about a response if the status is "200 OK", or should
they try to interpret the body of a "500 Internal Server Error" message or "401 Bad Request"?
Those things need to be specified.

h4. Others

If the suggestions above don't sound particularly interesting for this use case, I'd strongly
suggest, in the very least, removing any mention of REST from the spec and the code, because
this is not a REST protocol in any way.

Also, a question: if it's an HTTP protocol, why not expose it through the existing http port?


To reply to the questions about my suggestions for how to use REST:

* when you add a new version, you don't remove old ones. Spark v1.4 could add "/v2", but it
must still support "/v1" in the way that it was specified.
* as for new fields / types, that really depends on how you specify things. Personally, I
like to declare a released API "frozen": you can't add new types, fields, or anything that
the old release doesn't know about. Any new thing requires a new protocol version. But you
could take a different approach, by adding optional fields that don't cause breakages when
submitted to the old server that doesn't know about them. Again, these choices need to be
specified up front, otherwise the implementation of v1 becomes the spec, since where the spec
is not clear, the choices made by the implementation will become a de facto specification.

(BTW, especially with a v1, the implementation will invariably become a "de facto" specification,
that's unavoidable. But it helps to help the spec clearly cover all areas, so that hopefully
you don't need to reverse-engineer the implementation code to figure out how things work.)

Anyway, hope this is useulf and clarifies some of the concerns I have about the current spec.

> Provide a stable application submission gateway in standalone cluster mode
> --------------------------------------------------------------------------
>
>                 Key: SPARK-5388
>                 URL: https://issues.apache.org/jira/browse/SPARK-5388
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>            Priority: Blocker
>         Attachments: Stable Spark Standalone Submission.pdf
>
>
> The existing submission gateway in standalone mode is not compatible across Spark versions.
If you have a newer version of Spark submitting to an older version of the standalone Master,
it is currently not guaranteed to work. The goal is to provide a stable REST interface to
replace this channel.
> The first cut implementation will target standalone cluster mode because there are very
few messages exchanged. The design, however, should be general enough to potentially support
this for other cluster managers too. Note that this is not necessarily required in YARN because
we already use YARN's stable interface to submit applications there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message