geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sai Boorlagadda <sai_boorlaga...@apache.org>
Subject Re: 2 minute gateway startup time due to GEODE-5591
Date Wed, 05 Sep 2018 16:49:50 GMT
+1 to revert in 1.7 and leaving the fix on develop.

On Wed, Sep 5, 2018 at 9:47 AM Jacob Barrett <jbarrett@pivotal.io> wrote:

> I’m not ok with reverting in develop. Revert in 1.7 and modify in develop.
> We shouldn’t go backwards in develop. The current fix is better than the
> bug it fixes.
>
> > On Sep 5, 2018, at 9:40 AM, Nabarun Nag <nnag@apache.org> wrote:
> >
> > If everyone is okay with it, I will revert that change in develop and
> then
> > cherry pick it to release/1.7.0 branch.
> > Please do comment.
> >
> > Regards
> > Nabarun Nag
> >
> >
> >> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsmith@pivotal.io> wrote:
> >>
> >> +1 to yank it and rework the fix.
> >>
> >> Gester's change helps, but it just means that you will sometimes
> randomly
> >> have a 2 minute delay starting up a gateway receiver. I don't think
> that is
> >> a great user experience either.
> >>
> >> -Dan
> >>
> >> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
> bschuchardt@pivotal.io>
> >> wrote:
> >>
> >>> Let's yank it
> >>>
> >>>
> >>>
> >>>> On 9/4/18 5:04 PM, Sean Goller wrote:
> >>>>
> >>>> If it's to get the release out, I'm fine with reverting. I don't like
> >> it,
> >>>> but I'm not willing to die on that hill. :)
> >>>>
> >>>> -S.
> >>>>
> >>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsmith@pivotal.io> wrote:
> >>>>
> >>>> Spitting this into a separate thread.
> >>>>>
> >>>>> I see the issue. The two minute timeout is the constructor for
> >>>>> AcceptorImpl, where it retries to bind for 2 minutes.
> >>>>>
> >>>>> That behavior makes sense for CacheServer.start.
> >>>>>
> >>>>> But it doesn't make sense for the new logic in
> GatewayReceiver.start()
> >>>>> from
> >>>>> GEODE-5591. That code is trying to use CacheServer.start to scan
for
> an
> >>>>> available port, trying each port in a range. That free port finding
> >> logic
> >>>>> really doesn't want to have two minutes of retries for each port.
It
> >>>>> seems
> >>>>> like we need to rework the fix for GEODE-5591.
> >>>>>
> >>>>> Does it make sense to hold up the release to rework this fix, or
> should
> >>>>> we
> >>>>> just revert it? Have we switched concourse over to using alpine
> linux,
> >>>>> which I think was the original motivation for this fix?
> >>>>>
> >>>>> -Dan
> >>>>>
> >>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsmith@pivotal.io>
wrote:
> >>>>>
> >>>>> Why is it waiting at all in this case? Where is this 2 minute timeout
> >>>>>> coming from?
> >>>>>>
> >>>>>> -Dan
> >>>>>>
> >>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda <
> >>>>>>
> >>>>> sai.boorlagadda@gmail.com
> >>>>>
> >>>>>> wrote:
> >>>>>>> So the issue is that it takes longer to start than previous
> releases?
> >>>>>>> Also, is this wait time only when using Gfsh to create
> >>>>>>> gateway-receiver?
> >>>>>>>
> >>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <nnag@apache.org>
> wrote:
> >>>>>>>
> >>>>>>> Currently we have a minor issue in the release branch as
pointed
> out
> >>>>>>>>
> >>>>>>> by
> >>>>>
> >>>>>> Barry O.
> >>>>>>>> We will wait till a resolution is figured out for this
issue.
> >>>>>>>>
> >>>>>>>> Steps:
> >>>>>>>> 1. create locator
> >>>>>>>> 2. start server --name=server1 --server-port=40404
> >>>>>>>> 3. start server --name=server2 --server-port=40405
> >>>>>>>> 4. create gateway-receiver --member=server1
> >>>>>>>> 5. create gateway-receiver --member=server2 `This gets
stuck for 2
> >>>>>>>>
> >>>>>>> minutes`
> >>>>>>>
> >>>>>>>> Is the 2 minute wait time acceptable? Should we document
it? When
> we
> >>>>>>>>
> >>>>>>> revert
> >>>>>>>
> >>>>>>>> GEODE-5591, this issue does not happen.
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Nabarun Nag
> >>>>>>>>
> >>>>>>>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message