heron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Proposal for Heron API Server
Date Tue, 25 Jul 2017 17:15:00 GMT
It's not entirely accurate that Heron's deployment mode is only library
mode. The Heron scheduler could be implemented to either manage resource
scheduling from the client (i.e., Aurora Scheduler) or to run the
scheduling logic on the scheduler framework (i.e., Yarn). ISchedulerClient
has both LIbrarySchedulerClient and HttpServiceSchedulerClient for these
two use cases. These are the modes for scheduling single topologies
components though, and not about managing a centralized scheduling service
for multi-tenant usage which this proposal is about. Basically, we already
have Library and Service modes as terminology in the existing codebase, so
we shouldn't overload the concepts with a new definition of service mode.

If config distribution is the main issue, have we explored adding support
for fetching configs from a repository, just as we upload and fetch the
binary?

One concern about adding a scheduling service, is that it creates yet
another service to be maintained, and it increases the matrix of modes of
deployment available which adds complexity. For example today Aurora
topologies can be submitted in local mode only, but they can be updated in
local or service mode. YARN does both submit and update in service mode
today. With this additional service, we would need to support those modes,
plus those modes when run behind yet another service. The combination of
modes gets complex because we now anywhere from 0..2 potential layers of
services to go through.

This approach also requires the design of a delegated auth mechanism. For
example if the deploy service is running as a shared account, how will it
delegate auth on behalf of the user who is deploying the topology? If we go
down this path, we'd need to design for this.

I also share Maosong's concern of merging the tracker into the api service.
The design of the system will be more clear and easy to maintain/manage if
each system could live independently. If the goal is to make it easier for
administrators to manage all at once, I'd suggest we handle that with admin
management scripts that could simplify common tasks without merging the
service code.

thanks,
Bill

On Mon, Jul 24, 2017 at 6:27 PM, Karthik Ramasamy <karthik@streaml.io>
wrote:

> 1st version of the api server will support the following commands
>
> - submit
> - kill
> - update
> - activate
> - deactivate
>
> We are designing API server to be stateless and it will run as a job in the
> scheduler (similar to tracker and UI). With this approach, there is no need
> to worry about availability issues.
>
> cheers
> /karthik
>
> On Mon, Jul 24, 2017 at 5:43 PM, Fu Maosong <maosongfu@gmail.com> wrote:
>
> > I like the idea of *service mode* for heron.
> >
> > But we need to be more cautious about merging tracker into API Server,
> > since it can easily bring scalability and availability issues.
> > BTW, storm's nimbus serves both topology management requests as well as
> > metrics requests, which is kind of "merging tracker into API server". We
> > can learn the pros&cons of such design from it.
> >
> >
> > 2017-07-24 16:57 GMT-07:00 Karthik Ramasamy <karthik@streaml.io>:
> >
> > > *Rationale*:
> > >
> > > Currently, Heron supports a single mode of deployment called library
> > mode.
> > > Library mode requires several steps and client side configuration which
> > > could be intensive. Hence, we want to support another mode called
> service
> > > mode for simplified deployment.
> > >
> > > *Library Mode:*
> > >
> > > With Heron, the current mode of deployment is called library mode. This
> > > mode does not require any services running for Heron to deploy which
> is a
> > > huge advantage. However, it requires several configuration to be in the
> > > client side. Because of this administering becomes harder - especially
> > > maintaining the configuration and distributing them when the
> > configuration
> > > is changed. While this is possible for a bigger teams with dedicated
> > > dev-ops team, it might be overhead for medium and smaller teams.
> > > Furthermore, this mode of deployment does not have an API to
> > > submit/kill/activate/deactivate programmatically.
> > >
> > > *Service Mode:*
> > >
> > > In this mode, an api server will be running as a service. This service
> > will
> > > be run as yet another job in the scheduler so that it will be restarted
> > > during machine and process failures thereby providing fault tolerance.
> > This
> > > api server will maintain the configuration and heron cli will be
> > augmented
> > > to use the rest API to submit/kill/activate/deactivate the topologies
> in
> > > this mode. The advantage of this mode is it simplifies deployment but
> > > requires running a service.
> > >
> > > *Merging Tracker into API Server:*
> > >
> > > Current, Heron tracker written in python duplicates the state manager
> > code
> > > in python as well. The API server will support the heron tracker api in
> > > addition to topologies api. Depending on the mode of the deployment,
> the
> > > api server can be deployed in one of the modes - library mode (which
> > > exposes only the tracker API) and services mode (which exposes both the
> > > tracker + api server). Initially, the tracker and api server will be in
> > > separate directory until great amount of testing is done. Once it is
> > > completed, we can think about cutting over to entirely using API
> server.
> > >
> > > This change will not affect any of the existing deployments and it will
> > be
> > > backward compatible.
> > >
> >
> >
> >
> > --
> > With my best Regards
> > ------------------
> > Fu Maosong
> > Twitter Inc.
> > Mobile: +001-415-244-7520
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message