geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Baker <aba...@pivotal.io>
Subject Re: 2 minute gateway startup time due to GEODE-5591
Date Wed, 05 Sep 2018 17:35:22 GMT
Before this improvement is re-merged I’d like to see:

1) A test that characterizes the current behavior (e.g. doesn’t wait 2 min when there’s
a port conflict)
2) A test that demonstrates how the current logic is insufficient

Anthony


> On Sep 5, 2018, at 10:20 AM, Nabarun Nag <nnag@apache.org> wrote:
> 
> GEODE-5591 has been reverted in develop
> ref: 901da27f227a8ce2b7d6b681619782a1accd9330
> 
> Regards
> Nabarun Nag
> 
> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcmahon@pivotal.io> wrote:
> 
>> +1 for reverting in both places.
>> 
>> I see that there is already an isGatewayReceiver flag in the AcceptorImpl
>> constructor.  It's not ideal, but could we use this flag to prevent the 2
>> minute retry logic for happening if this flag is true?
>> 
>> Ryan
>> 
>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
>> lhughesgodfrey@pivotal.io> wrote:
>> 
>>> +1 for reverting in both places.
>>> 
>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsmith@pivotal.io> wrote:
>>> 
>>>> +1 for reverting in both places. The current fix is not better, that's
>>> why
>>>> we are reverting it on the release branch!
>>>> 
>>>> -Dan
>>>> 
>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarrett@pivotal.io>
>>> wrote:
>>>> 
>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in
>>>> develop.
>>>>> We shouldn’t go backwards in develop. The current fix is better than
>>> the
>>>>> bug it fixes.
>>>>> 
>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag <nnag@apache.org> wrote:
>>>>>> 
>>>>>> If everyone is okay with it, I will revert that change in develop
>> and
>>>>> then
>>>>>> cherry pick it to release/1.7.0 branch.
>>>>>> Please do comment.
>>>>>> 
>>>>>> Regards
>>>>>> Nabarun Nag
>>>>>> 
>>>>>> 
>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsmith@pivotal.io>
>> wrote:
>>>>>>> 
>>>>>>> +1 to yank it and rework the fix.
>>>>>>> 
>>>>>>> Gester's change helps, but it just means that you will sometimes
>>>>> randomly
>>>>>>> have a 2 minute delay starting up a gateway receiver. I don't
>> think
>>>>> that is
>>>>>>> a great user experience either.
>>>>>>> 
>>>>>>> -Dan
>>>>>>> 
>>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
>>>>> bschuchardt@pivotal.io>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Let's yank it
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote:
>>>>>>>>> 
>>>>>>>>> If it's to get the release out, I'm fine with reverting.
I don't
>>>> like
>>>>>>> it,
>>>>>>>>> but I'm not willing to die on that hill. :)
>>>>>>>>> 
>>>>>>>>> -S.
>>>>>>>>> 
>>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsmith@pivotal.io>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> Spitting this into a separate thread.
>>>>>>>>>> 
>>>>>>>>>> I see the issue. The two minute timeout is the constructor
for
>>>>>>>>>> AcceptorImpl, where it retries to bind for 2 minutes.
>>>>>>>>>> 
>>>>>>>>>> That behavior makes sense for CacheServer.start.
>>>>>>>>>> 
>>>>>>>>>> But it doesn't make sense for the new logic in
>>>>> GatewayReceiver.start()
>>>>>>>>>> from
>>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start
to
>> scan
>>>> for
>>>>> an
>>>>>>>>>> available port, trying each port in a range. That
free port
>>> finding
>>>>>>> logic
>>>>>>>>>> really doesn't want to have two minutes of retries
for each
>> port.
>>>> It
>>>>>>>>>> seems
>>>>>>>>>> like we need to rework the fix for GEODE-5591.
>>>>>>>>>> 
>>>>>>>>>> Does it make sense to hold up the release to rework
this fix,
>> or
>>>>> should
>>>>>>>>>> we
>>>>>>>>>> just revert it? Have we switched concourse over to
using alpine
>>>>> linux,
>>>>>>>>>> which I think was the original motivation for this
fix?
>>>>>>>>>> 
>>>>>>>>>> -Dan
>>>>>>>>>> 
>>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsmith@pivotal.io>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Why is it waiting at all in this case? Where is this
2 minute
>>>> timeout
>>>>>>>>>>> coming from?
>>>>>>>>>>> 
>>>>>>>>>>> -Dan
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda
<
>>>>>>>>>>> 
>>>>>>>>>> sai.boorlagadda@gmail.com
>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> So the issue is that it takes longer to start
than previous
>>>>> releases?
>>>>>>>>>>>> Also, is this wait time only when using Gfsh
to create
>>>>>>>>>>>> gateway-receiver?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag
<nnag@apache.org>
>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Currently we have a minor issue in the release
branch as
>>> pointed
>>>>> out
>>>>>>>>>>>>> 
>>>>>>>>>>>> by
>>>>>>>>>> 
>>>>>>>>>>> Barry O.
>>>>>>>>>>>>> We will wait till a resolution is figured
out for this
>> issue.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Steps:
>>>>>>>>>>>>> 1. create locator
>>>>>>>>>>>>> 2. start server --name=server1 --server-port=40404
>>>>>>>>>>>>> 3. start server --name=server2 --server-port=40405
>>>>>>>>>>>>> 4. create gateway-receiver --member=server1
>>>>>>>>>>>>> 5. create gateway-receiver --member=server2
`This gets stuck
>>>> for 2
>>>>>>>>>>>>> 
>>>>>>>>>>>> minutes`
>>>>>>>>>>>> 
>>>>>>>>>>>>> Is the 2 minute wait time acceptable?
Should we document it?
>>>> When
>>>>> we
>>>>>>>>>>>>> 
>>>>>>>>>>>> revert
>>>>>>>>>>>> 
>>>>>>>>>>>>> GEODE-5591, this issue does not happen.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Nabarun Nag
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message