cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc-Aurèle Brothier <ma...@exoscale.ch>
Subject Re: [DISCUSS] CloudStack graceful shutdown
Date Wed, 18 Apr 2018 08:19:59 GMT
As we are already using a list management server API calls to handle the
scripting of the shutdown/upgrade/start, I manually backported the code:

https://github.com/apache/cloudstack/pull/2578

On Tue, Apr 17, 2018 at 9:31 PM, Rafael Weingärtner <
rafaelweingartner@gmail.com> wrote:

> Ron, that is a good analogy.
>
> There is something else that I forgot to mention. We discussed the issue of
> migrating Jobs/tasks to other management servers. This is not something
> easy to achieve because of the way it is currently implemented in ACS.
> However, as soon as we have a more comprehensive solution to a graceful
> shutdown, this becomes something feasible for us to work on.
>
> I do not know if Ilya is going to develop a graceful shutdown or if someone
> else will pick this up, but we are willing to work on it. Of course, it is
> not something that we would develop right away because it will probably
> take quite some work, and we have some other priorities. However,  I will
> discuss this further internally and see what we can come up with.
>
> On Tue, Apr 17, 2018 at 1:46 PM, Ron Wheeler <rwheeler@artifact-software.
> com
> > wrote:
>
> > Part of this sounds like the Windows shut down process which is familiar
> > to many.
> >
> > For those who have never used Windows:
> >
> > Once you initiate the shutdown, it asks the tasks to shut down.
> > If tasks have not shutdown within a "reasonable period", it lists them
> and
> > asks you if you want to wait a bit longer, force them to close or abort
> the
> > shutdown so that you can manually shut them down.
> > If you "force" a shutdown it closes all of the tasks using all of the
> > brutality at its command.
> > If you abort, then you have to redo the shutdown after you have manually
> > exited from the processes that you care about.
> >
> > This is pretty user friendly but requires that you have a way to signal
> to
> > a task that it is time to say goodbye.
> >
> > The "reasonable time" needs to have a default that is short enough to
> make
> > the operator happy and long enough to have a reasonable chance of getting
> > everything stopped without intervention. If you allow the shutdown to
> > proceed after the interval, while the operator waits then you need to
> > refresh the list of running tasks when tasks end.
> >
> > Ron
> >
> >
> > On 17/04/2018 11:27 AM, Rafael Weingärtner wrote:
> >
> >> Ilya and others,
> >>
> >> We have been discussing this idea of graceful/nicely shutdown.  Our
> >> feeling
> >> is that we (in CloudStack community) might have been trying to solve
> this
> >> problem with too much scripting. What if we developed a more integrated
> >> (native) solution?
> >>
> >> Let me explain our idea.
> >>
> >> ACS has a table called “mshost”, which is used to store management
> server
> >> information. During balancing and when jobs are dispatched to other
> >> management servers this table is consulted/queried.  Therefore, we have
> >> been discussing the idea of creating a management API for management
> >> servers.  We could have an API method that changes the state of
> management
> >> servers to “prepare to maintenance” and then “maintenance” (as soon
as
> all
> >> of the task/jobs it is managing finish). The idea is that during
> >> rebalancing we would remove the hosts of servers that are not in “Up”
> >> state
> >> (of course we would also ignore hosts in the aforementioned state to
> >> receive hosts to manage).  Moreover, when we send/dispatch jobs to other
> >> management servers, we could ignore the ones that are not in “Up” state
> >> (which is something already done).
> >>
> >> By doing this, the nicely shutdown could be executed in a few steps.
> >>
> >> 1 – issue the maintenance method for the management server you desire
> >> 2 – wait until the MS goes into maintenance mode, while there are still
> >> running jobs it (the management server) will be maintained in prepare
> for
> >> maintenance
> >> 3 – execute the Linux shutdown command
> >>
> >> We would need other APIs methods to manage MSs then. An (i) API method
> to
> >> list MSs, and we could even create an (ii) API to remove
> old/de-activated
> >> management servers, which we currently do not have (forcing users to
> apply
> >> changed directly in the database).
> >>
> >> Moreover, in this model, we would not kill hanging jobs; we would wait
> >> until they expire and ACS expunges them. Of course, it is possible to
> >> develop a forceful maintenance method as well. Then, when the “prepare
> for
> >> maintenance” takes longer than a parameter, we could kill hanging jobs.
> >>
> >> All of this would allow the MS to be kept up and receiving requests
> until
> >> it can be safely shutdown. What do you guys about this approach?
> >>
> >> On Tue, Apr 10, 2018 at 6:52 PM, Yiping Zhang <yzhang@marketo.com>
> wrote:
> >>
> >> As a cloud admin, I would love to have this feature.
> >>>
> >>> It so happens that I just accidentally restarted my ACS management
> server
> >>> while two instances are migrating to another Xen cluster (via storage
> >>> migration, not live migration).  As results, both instances
> >>> ends up with corrupted data disk which can't be reattached or migrated.
> >>>
> >>> Any feature which prevents this from happening would be great.  A low
> >>> hanging fruit is simply checking for
> >>> if there are any async jobs running, especially any kind of migration
> >>> jobs
> >>> or other known long running type of
> >>> jobs and warn the operator  so that he has a chance to abort server
> >>> shutdowns.
> >>>
> >>> Yiping
> >>>
> >>> On 4/5/18, 3:13 PM, "ilya musayev" <ilya.mailing.lists@gmail.com>
> >>> wrote:
> >>>
> >>>      Andrija
> >>>
> >>>      This is a tough scenario.
> >>>
> >>>      As an admin, they way i would have handled this situation, is to
> >>> advertise
> >>>      the upcoming outage and then take away specific API commands from
> a
> >>> user a
> >>>      day before - so he does not cause any long running async jobs.
> Once
> >>>      maintenance completes - enable the API commands back to the user.
> >>> However -
> >>>      i dont know who your user base is and if this would be an
> acceptable
> >>>      solution.
> >>>
> >>>      Perhaps also investigate what can be done to speed up your long
> >>> running
> >>>      tasks...
> >>>
> >>>      As a side node, we will be working on a feature that would allow
> >>> for a
> >>>      graceful termination of the process/job, meaning if agent noticed
> a
> >>>      disconnect or termination request - it will abort the command in
> >>> flight. We
> >>>      can also consider restarting this tasks again or what not - but it
> >>> would
> >>>      not be part of this enhancement.
> >>>
> >>>      Regards
> >>>      ilya
> >>>
> >>>      On Thu, Apr 5, 2018 at 6:47 AM, Andrija Panic <
> >>> andrija.panic@gmail.com
> >>>      wrote:
> >>>
> >>>      > Hi Ilya,
> >>>      >
> >>>      > thanks for the feedback - but in "real world", you need to
> >>> "understand"
> >>>      > that 60min is next to useless timeout for some jobs (if I
> >>> understand
> >>> this
> >>>      > specific parameter correctly ?? - job is really canceled, not
> only
> >>> job
> >>>      > monitoring is canceled ???) -
> >>>      >
> >>>      > My value for the  "job.cancel.threshold.minutes" is 2880 minutes
> >>> (2
> >>> days?)
> >>>      >
> >>>      > I can tell you when you have CEPH/NFS (CEPH even "worse" case,
> >>> since
> >>> slower
> >>>      > read durign qemu-img convert process...) of 500GB, then imagine
> >>> snapshot
> >>>      > job will take many hours. Should I mention 1TB volumes (yes, we
> >>> had
> >>>      > client's like that...)
> >>>      > Than attaching 1TB volume, that was uploaded to ACS (lives
> >>> originally on
> >>>      > Secondary Storage, and takes time to be copied over to NFS/CEPH)
> >>> will take
> >>>      > up to few hours.
> >>>      > Then migrating 1TB volume from NFS to CEPH, or CEPH to NFS, also
> >>> takes
> >>>      > time...etc.
> >>>      >
> >>>      > I'm just giving you feedback as "user", admin of the cloud, zero
> >>> DEV
> >>> skills
> >>>      > here :) , just to make sure you make practical decisions (and
I
> >>> admit I
> >>>      > might be wrong with my stuff, but just giving you feedback from
> >>> our
> >>> public
> >>>      > cloud setup)
> >>>      >
> >>>      >
> >>>      > Cheers!
> >>>      >
> >>>      >
> >>>      >
> >>>      >
> >>>      > On 5 April 2018 at 15:16, Tutkowski, Mike <
> >>> Mike.Tutkowski@netapp.com
> >>>      > wrote:
> >>>      >
> >>>      > > Wow, there’s been a lot of good details noted from several
> >>> people
> >>> on how
> >>>      > > this process works today and how we’d like it to work in
the
> >>> near
> >>> future.
> >>>      > >
> >>>      > > 1) Any chance this is already documented on the Wiki?
> >>>      > >
> >>>      > > 2) If not, any chance someone would be willing to do so (a
> flow
> >>> diagram
> >>>      > > would be particularly useful).
> >>>      > >
> >>>      > > > On Apr 5, 2018, at 3:37 AM, Marc-Aurèle Brothier <
> >>> marco@exoscale.ch>
> >>>      > > wrote:
> >>>      > > >
> >>>      > > > Hi all,
> >>>      > > >
> >>>      > > > Good point ilya but as stated by Sergey there's more
thing
> to
> >>> consider
> >>>      > > > before being able to do a proper shutdown. I augmented
my
> >>> script
> >>> I gave
> >>>      > > you
> >>>      > > > originally and changed code in CS. What we're doing
for our
> >>> environment
> >>>      > > is
> >>>      > > > as follow:
> >>>      > > >
> >>>      > > > 1. the MGMT looks for a change in the file /etc/lb-agent
> which
> >>> contains
> >>>      > > > keywords for HAproxy[2] (ready, maint) so that HA-proxy
can
> >>> disable the
> >>>      > > > mgmt on the keyword "maint" and the mgmt server stops
a
> >>> couple of
> >>>      > > > threads[1] to stop processing async jobs in the queue
> >>>      > > > 2. Looks for the async jobs and wait until there is
none to
> >>> ensure you
> >>>      > > can
> >>>      > > > send the reconnect commands (if jobs are running, a
> reconnect
> >>> will
> >>>      > result
> >>>      > > > in a failed job since the result will never reach the
> >>> management
> >>>      > server -
> >>>      > > > the agent waits for the current job to be done before
> >>> reconnecting, and
> >>>      > > > discard the result... rooms for improvement here!)
> >>>      > > > 3. Issue a reconnectHost command to all the hosts connected
> to
> >>> the mgmt
> >>>      > > > server so that they reconnect to another one, otherwise
the
> >>> mgmt
> >>> must
> >>>      > be
> >>>      > > up
> >>>      > > > since it is used to forward commands to agents.
> >>>      > > > 4. when all agents are reconnected, we can shutdown
the
> >>> management
> >>>      > server
> >>>      > > > and perform the maintenance.
> >>>      > > >
> >>>      > > > One issue remains for me, during the reconnect, the
commands
> >>> that are
> >>>      > > > processed at the same time should be kept in a queue
until
> the
> >>> agents
> >>>      > > have
> >>>      > > > finished any current jobs and have reconnected. Today
the
> >>> little
> >>> time
> >>>      > > > window during which the reconnect happens can lead to
failed
> >>> jobs due
> >>>      > to
> >>>      > > > the agent not being connected at the right moment.
> >>>      > > >
> >>>      > > > I could push a PR for the change to stop some processing
> >>> threads
> >>> based
> >>>      > on
> >>>      > > > the content of a file. It's possible also to cancel
the
> drain
> >>> of
> >>> the
> >>>      > > > management by simply changing the content of the file
back
> to
> >>> "ready"
> >>>      > > > again, instead of "maint" [2].
> >>>      > > >
> >>>      > > > [1] AsyncJobMgr-Heartbeat, CapacityChecker, StatsCollector
> >>>      > > > [2] HA proxy documentation on agent checker:
> >>> https://cbonte.github.io/
> >>>      > > > haproxy-dconv/1.6/configuration.html#5.2-agent-check
> >>>      > > >
> >>>      > > > Regarding your issue on the port blocking, I think it's
fair
> >>> to
> >>>      > consider
> >>>      > > > that if you want to shutdown your server at some point,
you
> >>> have
> >>> to
> >>>      > stop
> >>>      > > > serving (some) requests. Here the only way it's to stop
> >>> serving
> >>>      > > everything.
> >>>      > > > If the API had a REST design, we could reject any
> >>> POST/PUT/DELETE
> >>>      > > > operations and allow GET ones. I don't know how hard
it
> would
> >>> be
> >>> today
> >>>      > to
> >>>      > > > only allow listBaseCmd operations to be more friendly
with
> the
> >>> users.
> >>>      > > >
> >>>      > > > Marco
> >>>      > > >
> >>>      > > >
> >>>      > > > On Thu, Apr 5, 2018 at 2:22 AM, Sergey Levitskiy <
> >>> serg38l@hotmail.com>
> >>>      > > > wrote:
> >>>      > > >
> >>>      > > >> Now without spellchecking :)
> >>>      > > >>
> >>>      > > >> This is not simple e.g. for VMware. Each management
server
> >>> also
> >>> acts
> >>>      > as
> >>>      > > an
> >>>      > > >> agent proxy so tasks against a particular ESX host
will be
> >>> always
> >>>      > > >> forwarded. That right answer will be to support
a native
> >>> “maintenance
> >>>      > > mode”
> >>>      > > >> for management server. When entered to such mode
the
> >>> management
> >>> server
> >>>      > > >> should release all agents including SSVM, block/redirect
> API
> >>> calls and
> >>>      > > >> login request and finish all async job it originated.
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> On Apr 4, 2018, at 5:15 PM, Sergey Levitskiy <
> >>> serg38l@hotmail.com
> >>>      > > <mailto:
> >>>      > > >> serg38l@hotmail.com>> wrote:
> >>>      > > >>
> >>>      > > >> This is not simple e.g. for VMware. Each management
server
> >>> also
> >>> acts
> >>>      > as
> >>>      > > an
> >>>      > > >> agent proxy so tasks against a particular ESX host
will be
> >>> always
> >>>      > > >> forwarded. That right answer will be to a native
support
> for
> >>>      > > “maintenance
> >>>      > > >> mode” for management server. When entered to such
mode the
> >>> management
> >>>      > > >> server should release all agents including save,
> >>> block/redirect
> >>> API
> >>>      > > calls
> >>>      > > >> and login request and finish all a sync job it originated.
> >>>      > > >>
> >>>      > > >> Sent from my iPhone
> >>>      > > >>
> >>>      > > >> On Apr 4, 2018, at 3:31 PM, Rafael Weingärtner
<
> >>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail
> .
> >>> com
> >>>      > wrote:
> >>>      > > >>
> >>>      > > >> Ilya, still regarding the management server that
is being
> >>> shut
> >>> down
> >>>      > > issue;
> >>>      > > >> if other MSs/or maybe system VMs (I am not sure
to know if
> >>> they
> >>> are
> >>>      > > able to
> >>>      > > >> do such tasks) can direct/redirect/send new jobs
to this
> >>> management
> >>>      > > server
> >>>      > > >> (the one being shut down), the process might never
end
> >>> because
> >>> new
> >>>      > tasks
> >>>      > > >> are always being created for the management server
that we
> >>> want
> >>> to
> >>>      > shut
> >>>      > > >> down. Is this scenario possible?
> >>>      > > >>
> >>>      > > >> That is why I mentioned blocking the port 8250 for
the
> >>>      > > “graceful-shutdown”.
> >>>      > > >>
> >>>      > > >> If this scenario is not possible, then everything
s fine.
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> On Wed, Apr 4, 2018 at 7:14 PM, ilya musayev <
> >>>      > > ilya.mailing.lists@gmail.com
> >>>      > > >> <mailto:ilya.mailing.lists@gmail.com>>
> >>>      > > >> wrote:
> >>>      > > >>
> >>>      > > >> I'm thinking of using a configuration from
> >>>      > > "job.cancel.threshold.minutes" -
> >>>      > > >> it will be the longest
> >>>      > > >>
> >>>      > > >>    "category": "Advanced",
> >>>      > > >>
> >>>      > > >>    "description": "Time (in minutes) for async-jobs
to be
> >>> forcely
> >>>      > > >> cancelled if it has been in process for long",
> >>>      > > >>
> >>>      > > >>    "name": "job.cancel.threshold.minutes",
> >>>      > > >>
> >>>      > > >>    "value": "60"
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner
<
> >>>      > > >> rafaelweingartner@gmail.com<mailto:rafaelweingartner@gmail
> .
> >>> com
> >>>      > wrote:
> >>>      > > >>
> >>>      > > >> Big +1 for this feature; I only have a few doubts.
> >>>      > > >>
> >>>      > > >> * Regarding the tasks/jobs that management servers
(MSs)
> >>> execute; are
> >>>      > > >> these
> >>>      > > >> tasks originate from requests that come to the MS,
or is it
> >>> possible
> >>>      > > that
> >>>      > > >> requests received by one management server to be
executed
> by
> >>> other? I
> >>>      > > >> mean,
> >>>      > > >> if I execute a request against MS1, will this request
> always
> >>> be
> >>>      > > >> executed/threated by MS1, or is it possible that
this
> >>> request is
> >>>      > > executed
> >>>      > > >> by another MS (e.g. MS2)?
> >>>      > > >>
> >>>      > > >> * I would suggest that after we block traffic coming
from
> >>>      > > >> 8080/8443/8250(we
> >>>      > > >> will need to block this as well right?), we can
log the
> >>> execution of
> >>>      > > >> tasks.
> >>>      > > >> I mean, something saying, there are XXX tasks (enumerate
> >>> tasks)
> >>> still
> >>>      > > >> being
> >>>      > > >> executed, we will wait for them to finish before
shutting
> >>> down.
> >>>      > > >>
> >>>      > > >> * The timeout (60 minutes suggested) could be global
> settings
> >>> that we
> >>>      > > can
> >>>      > > >> load before executing the graceful-shutdown.
> >>>      > > >>
> >>>      > > >> On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <
> >>>      > > >> ilya.mailing.lists@gmail.com<mailto:ilya.mailing.lists@
> >>> gmail.com>
> >>>      > > >>
> >>>      > > >> wrote:
> >>>      > > >>
> >>>      > > >> Use case:
> >>>      > > >> In any environment - time to time - administrator
needs to
> >>> perform a
> >>>      > > >> maintenance. Current stop sequence of cloudstack
management
> >>> server
> >>>      > will
> >>>      > > >> ignore the fact that there may be long running async
jobs -
> >>> and
> >>>      > > >> terminate
> >>>      > > >> the process. This in turn can create a poor user
experience
> >>> and
> >>>      > > >> occasional
> >>>      > > >> inconsistency  in cloudstack db.
> >>>      > > >>
> >>>      > > >> This is especially painful in large environments
where the
> >>> user
> >>> has
> >>>      > > >> thousands of nodes and there is a continuous patching
that
> >>> happens
> >>>      > > >> around
> >>>      > > >> the clock - that requires migration of workload
from one
> >>> node to
> >>>      > > >> another.
> >>>      > > >>
> >>>      > > >> With that said - i've created a script that monitors
the
> >>> async
> >>> job
> >>>      > > >> queue
> >>>      > > >> for given MS and waits for it complete all jobs.
More
> details
> >>> are
> >>>      > > >> posted
> >>>      > > >> below.
> >>>      > > >>
> >>>      > > >> I'd like to introduce "graceful-shutdown" into the
> >>> systemctl/service
> >>>      > of
> >>>      > > >> cloudstack-management service.
> >>>      > > >>
> >>>      > > >> The details of how it will work is below:
> >>>      > > >>
> >>>      > > >> Workflow for graceful shutdown:
> >>>      > > >> Using iptables/firewalld - block any connection
attempts on
> >>> 8080/8443
> >>>      > > >> (we
> >>>      > > >> can identify the ports dynamically)
> >>>      > > >> Identify the MSID for the node, using the proper
msid -
> query
> >>>      > > >> async_job
> >>>      > > >> table for
> >>>      > > >> 1) any jobs that are still running (or job_status=“0”)
> >>>      > > >> 2) job_dispatcher not like “pseudoJobDispatcher"
> >>>      > > >> 3) job_init_msid=$my_ms_id
> >>>      > > >>
> >>>      > > >> Monitor this async_job table for 60 minutes - until
all
> async
> >>> jobs for
> >>>      > > >> MSID
> >>>      > > >> are done, then proceed with shutdown
> >>>      > > >>  If failed for any reason or terminated, catch the
exit via
> >>> trap
> >>>      > > >> command
> >>>      > > >> and unblock the 8080/8443
> >>>      > > >>
> >>>      > > >> Comments are welcome
> >>>      > > >>
> >>>      > > >> Regards,
> >>>      > > >> ilya
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> --
> >>>      > > >> Rafael Weingärtner
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >>
> >>>      > > >> --
> >>>      > > >> Rafael Weingärtner
> >>>      > > >>
> >>>      > >
> >>>      >
> >>>      >
> >>>      >
> >>>      > --
> >>>      >
> >>>      > Andrija Panić
> >>>      >
> >>>
> >>>
> >>>
> >>>
> >>
> > --
> > Ron Wheeler
> > President
> > Artifact Software Inc
> > email: rwheeler@artifact-software.com
> > skype: ronaldmwheeler
> > phone: 866-970-2435, ext 102
> >
> >
>
>
> --
> Rafael Weingärtner
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message