samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek S <abks...@gmail.com>
Subject Re: Problem: upgrade 1.2 to 1.3 caused loss of clean shutdown on SIGTERM
Date Sat, 11 Jan 2020 05:44:00 GMT
Hey Thunder,

Sorry for the late reply, Yes, this works fine in cluster mode as the hooks
are attached in ContainerLaunchUtil (used in cluster mode).
You're right, for the Standalone case the shutdownHook can be attached in
LocalApplicationRunner.run() and removed as part of
LocalApplicationRunner.kill().
I'd be more than happy to review your patch if you want to work on this.
I see that you also mentioned that your own shutdownHook were not called on
SIGTERM - can you help me understand how and where this was hooked up to
the runtime ?

Thanks,
Abhishek


On Wed, Jan 8, 2020 at 7:06 AM Thunder Stumpges <thunder.stumpges@gmail.com>
wrote:

> Thanks Abhishek,
>
> I believe you are correct; removing the shutdown hook from inside the
> container was the problem. I took the shutdown hook code removed from
> SamzaContainer in your commit #83e152904ef5 and pulled it out to our App
> Runner, calling LocalApplicationRunner.kill() and then
> LocalApplicationRunner.waitForFinish(timeout) and I think that has restored
> all of the shutdown sequencing.
>
> Where do you think this belongs (in the scope of SAMZA-2426 that you
> created)? Maybe in LocalApplicationRunner itself? You said you already took
> care of shutdown sequence in cluster mode, yes? I'm open to helping on this
> one, just let me know.
>
> thanks,
> Thunder
>
>
>
> On Tue, Jan 7, 2020 at 7:12 PM Abhishek S <abkshvn@gmail.com> wrote:
>
> > The rational behind moving the shutdown handler out of SamzaContainer was
> > to let standalone jobs (which are usually part of other online
> > applications) maintain their own shutdown hooks.
> > This prevents Samza hooks from running in parallel or causing a deadlock
> > with the shutdown hooks of parent application that uses Samza in
> standalone
> > mode (as a library).
> >
> > That being said, I agree that Containers should attempt graceful shutdown
> > and wait at-most "task.shutdown.ms" on SIGTERM.
> > I created SAMZA-2426 to investigate the issue further and track the work
> > required.
> >
> > Abhishek
> >
> >
> >
> >
> >
> >
> > On Tue, Jan 7, 2020 at 1:55 PM Brett Konold <bkonold@linkedin.com.invalid
> >
> > wrote:
> >
> > > Thunder,
> > >
> > > How were you able to determine that the shutdown hooks are not being
> > > called?
> > >
> > > If you're able to share any of your shutdown logs from before and after
> > > your 1.3 upgrade, that would be helpful while I try to reproduce this
> > issue
> > > myself.
> > >
> > > Brett
> > > ________________________________
> > > From: Brett Konold <bkonold@linkedin.com>
> > > Sent: Tuesday, January 7, 2020 1:25 PM
> > > To: dev@samza.apache.org <dev@samza.apache.org>
> > > Subject: Re: Problem: upgrade 1.2 to 1.3 caused loss of clean shutdown
> on
> > > SIGTERM
> > >
> > > Hey Thunder,
> > >
> > > Thanks for reporting. Taking a look into this and will get back to you
> > > when I have something.
> > >
> > > Brett
> > > ________________________________
> > > From: Thunder Stumpges <thunder.stumpges@gmail.com>
> > > Sent: Monday, January 6, 2020 6:54 PM
> > > To: dev@samza.apache.org <dev@samza.apache.org>
> > > Subject: Problem: upgrade 1.2 to 1.3 caused loss of clean shutdown on
> > > SIGTERM
> > >
> > > We are attempting to upgrade from samza 1.2 to 1.3 in hopes of fixing
> > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSAMZA-2198&amp;data=02%7C01%7Cbkonold%40linkedin.com%7C3d5e0f0bf97e4cc633de08d7931d06ad%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637139625187723288&amp;sdata=uwKMfKxQmS9uWS62Hew814P%2Fja7FpaViNTVzYrp03Ok%3D&amp;reserved=0
> > > where there was a deadlock
> > > in the shutdown code which prevented completing a clean shutdown.
> > >
> > > After the upgrade, it appears like NONE of the shutdown hooks / code
> are
> > > being called and it just immediately shuts down.
> > >
> > > We are running stand alone Low Level Tasks with LocalApplicationRunner
> /
> > > ZkJobCoordinator in Docker / K8s.
> > >
> > > When killing from docker for testing, we use docker kill -s SIGTERM
> > > <container> to send SIGTERM instead of SIGKILL. This was working in
> samza
> > > 1.2 (other than the deadlock from the above issue).
> > >
> > > Any ideas what changed?
> > >
> > > Thanks,
> > > Thunder
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message