heron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Proposal for Heron API Server
Date Tue, 25 Jul 2017 21:06:13 GMT
+1 for more upfront clarification of the authentication model.

I don't think it makes sense to combine all REST services into one, since
they serve different functional areas. They should be separated by their
concerns, their function, their auth needs, their SLA, etc. This will help
with maintainability of the code and manageability of the services. Large
over-coupled services are often split apart into smaller microservices for
just these reasons.

For example combining the tracker with the API service means:

- You can't make updates to and redeploy tracker (which is a non-critical
service) without redeploying the submit/kill/update service (which is more
critical). Both now have their lifecycles coupled.
- A critical bug or outage of one service now effects the other.
- The performance/demand characteristics of one impacts the other.
- Cross-cutting API concerns like authentication or caching gets more
complicated to support. The API service is a read/write service with the
need for auth, while the tracker service is read-only.
- Code to manage specific functional areas (e.g., metrics serving vs
topology lifecycle management) are unnecessarily coupled.
- Security concerns become coupled. For example consider the case where the
tracker port should be exposed through a firewall to serve read-only data,
but the service API should be behind the firewall for tighter security
control. Combining the services into one couples them to the same port.
This has caused us a number of headache with Presto, which uses the same
port for everything (read-only APIs, user UIs, system-level RPC, etc).

The state manager python code is also used by the Heron Executor, so that
would also need to be re-implemented in Java if the goal is to get rid of
it. I think it's a fine idea to reimplement tracker in Java though, but
that still doesn't make the case to combine the two services into one.


On Tue, Jul 25, 2017 at 12:06 PM, Sanjeev Kulkarni <sanjeevrk@gmail.com>
wrote:

> One comment that I have wrt the API server is some sort of design note on
> how you plan to accomodate authentication(atleast on some popular
> schedulers like DCOS and Kubernetics). It need not be a full fledged
> detailed document, but should have the basic contours.
>
>
> On Tue, Jul 25, 2017 at 11:54 AM, Karthik Ramasamy <karthik@streaml.io>
> wrote:
>
>> Bill -
>>
>> The main driving factors for API server are the following -
>>
>> - How can heron jobs be managed using purely API instead of using CLI
>> without any dependency?
>>
>> - A single place for maintaining config including keys (which you don’t
>> want to expose to every client)
>>
>> - Reduce the installation steps needed
>>
>> - Provide authentication support
>>
>> Rest of my responses are inlined as k>
>>
>> > On Jul 25, 2017, at 10:15 AM, Bill Graham <billgraham@gmail.com> wrote:
>> >
>> > It's not entirely accurate that Heron's deployment mode is only library
>> > mode. The Heron scheduler could be implemented to either manage resource
>> > scheduling from the client (i.e., Aurora Scheduler) or to run the
>> > scheduling logic on the scheduler framework (i.e., Yarn).
>> ISchedulerClient
>> > has both LIbrarySchedulerClient and HttpServiceSchedulerClient for these
>> > two use cases. These are the modes for scheduling single topologies
>> > components though, and not about managing a centralized scheduling
>> service
>> > for multi-tenant usage which this proposal is about. Basically, we
>> already
>> > have Library and Service modes as terminology in the existing codebase,
>> so
>> > we shouldn't overload the concepts with a new definition of service
>> mode.
>>
>> k>I agree I might be overloading the terminology here. Whether the
>> scheduler is run
>> in library mode vs in the schedule framework is independent of the API
>> server. API
>> server is not a scheduling service - it just a REST end point server that
>> translates
>> REST API into actions. Perhaps a change of terminology might make it
>> easier to understand.
>> Any suggestions?
>>
>> > If config distribution is the main issue, have we explored adding
>> support
>> > for fetching configs from a repository, just as we upload and fetch the
>> > binary?
>>
>> k>We did explore this aspect of having a config in a central place.
>> However, there are issues with this approach
>>
>> - Heron cli have to download every time it has to
>> submit/kill/activate/deactivate the topologies. Alternatively,
>> the config can be cached but it require invalidation and refresh
>> periodically at the client side - which could lead
>> to issues.
>>
>> - All the keys and important stuff could be exposed on the client (if you
>> are working with cloud environments)
>>
>> - If we have to manage the jobs programmatically including
>> submission/killing/updating/activating/deactivating, it
>> introduces a dependency - such as downloading config before submitting
>> making it cumbersome for programmers.
>>
>> >
>> > One concern about adding a scheduling service, is that it creates yet
>> > another service to be maintained, and it increases the matrix of modes
>> of
>> > deployment available which adds complexity. For example today Aurora
>> > topologies can be submitted in local mode only, but they can be updated
>> in
>> > local or service mode. YARN does both submit and update in service mode
>> > today. With this additional service, we would need to support those
>> modes,
>> > plus those modes when run behind yet another service. The combination of
>> > modes gets complex because we now anywhere from 0..2 potential layers of
>> > services to go through.
>>
>> k>As pointed out above, this is not a scheduling service - it is just a
>> rest end point. The API
>> service will be deployed as yet another job similar to heron-ui and
>> heron-tracker. This service
>> will be stateless and hence it will be restarted by the scheduler if it
>> dies - which means it is fault
>> tolerant. We can run multiple instance of the service as well for
>> scalability.
>>
>> Furthermore, the API server will preserve those deployment modes for
>> Aurora and YARN - independent
>> of whether you deploy using API server or directly from the dev machine
>> (like we have now).
>>
>>
>> > This approach also requires the design of a delegated auth mechanism.
>> For
>> > example if the deploy service is running as a shared account, how will
>> it
>> > delegate auth on behalf of the user who is deploying the topology? If
>> we go
>> > down this path, we'd need to design for this.
>>
>> k>As I mentioned earlier, one of the motivations for API server is to
>> implement some kind of authentication
>> - Kerberos/TLS/LDAP. However, the first phase will be providing the
>> functionality followed by the 2nd phase
>> which includes an authentication mechanism.
>>
>> > I also share Maosong's concern of merging the tracker into the api
>> service.
>> > The design of the system will be more clear and easy to maintain/manage
>> if
>> > each system could live independently. If the goal is to make it easier
>> for
>> > administrators to manage all at once, I'd suggest we handle that with
>> admin
>> > management scripts that could simplify common tasks without merging the
>> > service code.
>>
>> k>In fact, I would argue the other way around - since the main focus of
>> the API server to provide REST api
>>
>> - Why not move all the API’s into one single service rather having two?
>>
>> - Furthermore, the current tracker uses state manager for getting
>> metadata etc. Since tracker
>> uses python, the state manager functionality needs to be duplicated in
>> python and Java.
>>
>> With API server the plan is to write in Java and we can eliminate all the
>> python code for state manager
>> thereby reducing duplicate functionality in different languages. Our
>> initial focus to get this service rolled
>> out with the first phase of API submit/kill/update/activate/deactivate
>> and in the second phase we can
>> merge the tracker.
>>
>> Note that the introduction of server does not change in any way the
>> current mode of deployment.
>>
>> cheers
>> /karthik
>>
>> > On Mon, Jul 24, 2017 at 6:27 PM, Karthik Ramasamy <karthik@streaml.io>
>> > wrote:
>> >
>> >> 1st version of the api server will support the following commands
>> >>
>> >> - submit
>> >> - kill
>> >> - update
>> >> - activate
>> >> - deactivate
>> >>
>> >> We are designing API server to be stateless and it will run as a job
>> in the
>> >> scheduler (similar to tracker and UI). With this approach, there is no
>> need
>> >> to worry about availability issues.
>> >>
>> >> cheers
>> >> /karthik
>> >>
>> >> On Mon, Jul 24, 2017 at 5:43 PM, Fu Maosong <maosongfu@gmail.com>
>> wrote:
>> >>
>> >>> I like the idea of *service mode* for heron.
>> >>>
>> >>> But we need to be more cautious about merging tracker into API Server,
>> >>> since it can easily bring scalability and availability issues.
>> >>> BTW, storm's nimbus serves both topology management requests as well
>> as
>> >>> metrics requests, which is kind of "merging tracker into API server".
>> We
>> >>> can learn the pros&cons of such design from it.
>> >>>
>> >>>
>> >>> 2017-07-24 16:57 GMT-07:00 Karthik Ramasamy <karthik@streaml.io>:
>> >>>
>> >>>> *Rationale*:
>> >>>>
>> >>>> Currently, Heron supports a single mode of deployment called library
>> >>> mode.
>> >>>> Library mode requires several steps and client side configuration
>> which
>> >>>> could be intensive. Hence, we want to support another mode called
>> >> service
>> >>>> mode for simplified deployment.
>> >>>>
>> >>>> *Library Mode:*
>> >>>>
>> >>>> With Heron, the current mode of deployment is called library mode.
>> This
>> >>>> mode does not require any services running for Heron to deploy which
>> >> is a
>> >>>> huge advantage. However, it requires several configuration to be
in
>> the
>> >>>> client side. Because of this administering becomes harder -
>> especially
>> >>>> maintaining the configuration and distributing them when the
>> >>> configuration
>> >>>> is changed. While this is possible for a bigger teams with dedicated
>> >>>> dev-ops team, it might be overhead for medium and smaller teams.
>> >>>> Furthermore, this mode of deployment does not have an API to
>> >>>> submit/kill/activate/deactivate programmatically.
>> >>>>
>> >>>> *Service Mode:*
>> >>>>
>> >>>> In this mode, an api server will be running as a service. This
>> service
>> >>> will
>> >>>> be run as yet another job in the scheduler so that it will be
>> restarted
>> >>>> during machine and process failures thereby providing fault
>> tolerance.
>> >>> This
>> >>>> api server will maintain the configuration and heron cli will be
>> >>> augmented
>> >>>> to use the rest API to submit/kill/activate/deactivate the
>> topologies
>> >> in
>> >>>> this mode. The advantage of this mode is it simplifies deployment
but
>> >>>> requires running a service.
>> >>>>
>> >>>> *Merging Tracker into API Server:*
>> >>>>
>> >>>> Current, Heron tracker written in python duplicates the state manager
>> >>> code
>> >>>> in python as well. The API server will support the heron tracker
api
>> in
>> >>>> addition to topologies api. Depending on the mode of the deployment,
>> >> the
>> >>>> api server can be deployed in one of the modes - library mode (which
>> >>>> exposes only the tracker API) and services mode (which exposes both
>> the
>> >>>> tracker + api server). Initially, the tracker and api server will
be
>> in
>> >>>> separate directory until great amount of testing is done. Once it
is
>> >>>> completed, we can think about cutting over to entirely using API
>> >> server.
>> >>>>
>> >>>> This change will not affect any of the existing deployments and
it
>> will
>> >>> be
>> >>>> backward compatible.
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> With my best Regards
>> >>> ------------------
>> >>> Fu Maosong
>> >>> Twitter Inc.
>> >>> Mobile: +001-415-244-7520
>> >>>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message