geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nabarun Nag <n...@apache.org>
Subject Re: 2 minute gateway startup time due to GEODE-5591
Date Wed, 05 Sep 2018 17:20:44 GMT
GEODE-5591 has been reverted in develop
ref: 901da27f227a8ce2b7d6b681619782a1accd9330

Regards
Nabarun Nag

On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcmahon@pivotal.io> wrote:

> +1 for reverting in both places.
>
> I see that there is already an isGatewayReceiver flag in the AcceptorImpl
> constructor.  It's not ideal, but could we use this flag to prevent the 2
> minute retry logic for happening if this flag is true?
>
> Ryan
>
> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
> lhughesgodfrey@pivotal.io> wrote:
>
> > +1 for reverting in both places.
> >
> > On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsmith@pivotal.io> wrote:
> >
> > > +1 for reverting in both places. The current fix is not better, that's
> > why
> > > we are reverting it on the release branch!
> > >
> > > -Dan
> > >
> > > On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarrett@pivotal.io>
> > wrote:
> > >
> > > > I’m not ok with reverting in develop. Revert in 1.7 and modify in
> > > develop.
> > > > We shouldn’t go backwards in develop. The current fix is better than
> > the
> > > > bug it fixes.
> > > >
> > > > > On Sep 5, 2018, at 9:40 AM, Nabarun Nag <nnag@apache.org> wrote:
> > > > >
> > > > > If everyone is okay with it, I will revert that change in develop
> and
> > > > then
> > > > > cherry pick it to release/1.7.0 branch.
> > > > > Please do comment.
> > > > >
> > > > > Regards
> > > > > Nabarun Nag
> > > > >
> > > > >
> > > > >> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsmith@pivotal.io>
> wrote:
> > > > >>
> > > > >> +1 to yank it and rework the fix.
> > > > >>
> > > > >> Gester's change helps, but it just means that you will sometimes
> > > > randomly
> > > > >> have a 2 minute delay starting up a gateway receiver. I don't
> think
> > > > that is
> > > > >> a great user experience either.
> > > > >>
> > > > >> -Dan
> > > > >>
> > > > >> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
> > > > bschuchardt@pivotal.io>
> > > > >> wrote:
> > > > >>
> > > > >>> Let's yank it
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>> On 9/4/18 5:04 PM, Sean Goller wrote:
> > > > >>>>
> > > > >>>> If it's to get the release out, I'm fine with reverting.
I don't
> > > like
> > > > >> it,
> > > > >>>> but I'm not willing to die on that hill. :)
> > > > >>>>
> > > > >>>> -S.
> > > > >>>>
> > > > >>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsmith@pivotal.io>
> > wrote:
> > > > >>>>
> > > > >>>> Spitting this into a separate thread.
> > > > >>>>>
> > > > >>>>> I see the issue. The two minute timeout is the constructor
for
> > > > >>>>> AcceptorImpl, where it retries to bind for 2 minutes.
> > > > >>>>>
> > > > >>>>> That behavior makes sense for CacheServer.start.
> > > > >>>>>
> > > > >>>>> But it doesn't make sense for the new logic in
> > > > GatewayReceiver.start()
> > > > >>>>> from
> > > > >>>>> GEODE-5591. That code is trying to use CacheServer.start
to
> scan
> > > for
> > > > an
> > > > >>>>> available port, trying each port in a range. That
free port
> > finding
> > > > >> logic
> > > > >>>>> really doesn't want to have two minutes of retries
for each
> port.
> > > It
> > > > >>>>> seems
> > > > >>>>> like we need to rework the fix for GEODE-5591.
> > > > >>>>>
> > > > >>>>> Does it make sense to hold up the release to rework
this fix,
> or
> > > > should
> > > > >>>>> we
> > > > >>>>> just revert it? Have we switched concourse over to
using alpine
> > > > linux,
> > > > >>>>> which I think was the original motivation for this
fix?
> > > > >>>>>
> > > > >>>>> -Dan
> > > > >>>>>
> > > > >>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsmith@pivotal.io>
> > > wrote:
> > > > >>>>>
> > > > >>>>> Why is it waiting at all in this case? Where is this
2 minute
> > > timeout
> > > > >>>>>> coming from?
> > > > >>>>>>
> > > > >>>>>> -Dan
> > > > >>>>>>
> > > > >>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda
<
> > > > >>>>>>
> > > > >>>>> sai.boorlagadda@gmail.com
> > > > >>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>> So the issue is that it takes longer to start
than previous
> > > > releases?
> > > > >>>>>>> Also, is this wait time only when using Gfsh
to create
> > > > >>>>>>> gateway-receiver?
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag
<nnag@apache.org>
> > > > wrote:
> > > > >>>>>>>
> > > > >>>>>>> Currently we have a minor issue in the release
branch as
> > pointed
> > > > out
> > > > >>>>>>>>
> > > > >>>>>>> by
> > > > >>>>>
> > > > >>>>>> Barry O.
> > > > >>>>>>>> We will wait till a resolution is figured
out for this
> issue.
> > > > >>>>>>>>
> > > > >>>>>>>> Steps:
> > > > >>>>>>>> 1. create locator
> > > > >>>>>>>> 2. start server --name=server1 --server-port=40404
> > > > >>>>>>>> 3. start server --name=server2 --server-port=40405
> > > > >>>>>>>> 4. create gateway-receiver --member=server1
> > > > >>>>>>>> 5. create gateway-receiver --member=server2
`This gets stuck
> > > for 2
> > > > >>>>>>>>
> > > > >>>>>>> minutes`
> > > > >>>>>>>
> > > > >>>>>>>> Is the 2 minute wait time acceptable?
Should we document it?
> > > When
> > > > we
> > > > >>>>>>>>
> > > > >>>>>>> revert
> > > > >>>>>>>
> > > > >>>>>>>> GEODE-5591, this issue does not happen.
> > > > >>>>>>>>
> > > > >>>>>>>> Regards
> > > > >>>>>>>> Nabarun Nag
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message