geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Udo Kohlmeyer <...@apache.org>
Subject Re: 2 minute gateway startup time due to GEODE-5591
Date Wed, 05 Sep 2018 17:55:40 GMT
Thank you. I must have missed that :)


On 9/5/18 10:54, Nabarun Nag wrote:
> @Udo I have mentioned in an earlier mail that it will be reverted in
> develop and then cherry picked to develop. release/1.7.0 branch has not
> being published yet, as it is undergoing preliminary tests before release
> candidate is published.
>
> Regards
> Nabarun Nag
>
> On Wed, Sep 5, 2018 at 10:46 AM Udo Kohlmeyer <udo@apache.org> wrote:
>
>> Did we also revert this in 1.7? I assume it has, but not directly stated
>> here.
>>
>>
>> On 9/5/18 10:20, Nabarun Nag wrote:
>>> GEODE-5591 has been reverted in develop
>>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330
>>>
>>> Regards
>>> Nabarun Nag
>>>
>>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon <rmcmahon@pivotal.io>
>> wrote:
>>>> +1 for reverting in both places.
>>>>
>>>> I see that there is already an isGatewayReceiver flag in the
>> AcceptorImpl
>>>> constructor.  It's not ideal, but could we use this flag to prevent the
>> 2
>>>> minute retry logic for happening if this flag is true?
>>>>
>>>> Ryan
>>>>
>>>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey <
>>>> lhughesgodfrey@pivotal.io> wrote:
>>>>
>>>>> +1 for reverting in both places.
>>>>>
>>>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsmith@pivotal.io> wrote:
>>>>>
>>>>>> +1 for reverting in both places. The current fix is not better, that's
>>>>> why
>>>>>> we are reverting it on the release branch!
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarrett@pivotal.io>
>>>>> wrote:
>>>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify
in
>>>>>> develop.
>>>>>>> We shouldn’t go backwards in develop. The current fix is better
than
>>>>> the
>>>>>>> bug it fixes.
>>>>>>>
>>>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag <nnag@apache.org>
wrote:
>>>>>>>>
>>>>>>>> If everyone is okay with it, I will revert that change in
develop
>>>> and
>>>>>>> then
>>>>>>>> cherry pick it to release/1.7.0 branch.
>>>>>>>> Please do comment.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Nabarun Nag
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsmith@pivotal.io>
>>>> wrote:
>>>>>>>>> +1 to yank it and rework the fix.
>>>>>>>>>
>>>>>>>>> Gester's change helps, but it just means that you will
sometimes
>>>>>>> randomly
>>>>>>>>> have a 2 minute delay starting up a gateway receiver.
I don't
>>>> think
>>>>>>> that is
>>>>>>>>> a great user experience either.
>>>>>>>>>
>>>>>>>>> -Dan
>>>>>>>>>
>>>>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt <
>>>>>>> bschuchardt@pivotal.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Let's yank it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote:
>>>>>>>>>>>
>>>>>>>>>>> If it's to get the release out, I'm fine with
reverting. I don't
>>>>>> like
>>>>>>>>> it,
>>>>>>>>>>> but I'm not willing to die on that hill. :)
>>>>>>>>>>>
>>>>>>>>>>> -S.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsmith@pivotal.io>
>>>>> wrote:
>>>>>>>>>>> Spitting this into a separate thread.
>>>>>>>>>>>> I see the issue. The two minute timeout is
the constructor for
>>>>>>>>>>>> AcceptorImpl, where it retries to bind for
2 minutes.
>>>>>>>>>>>>
>>>>>>>>>>>> That behavior makes sense for CacheServer.start.
>>>>>>>>>>>>
>>>>>>>>>>>> But it doesn't make sense for the new logic
in
>>>>>>> GatewayReceiver.start()
>>>>>>>>>>>> from
>>>>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start
to
>>>> scan
>>>>>> for
>>>>>>> an
>>>>>>>>>>>> available port, trying each port in a range.
That free port
>>>>> finding
>>>>>>>>> logic
>>>>>>>>>>>> really doesn't want to have two minutes of
retries for each
>>>> port.
>>>>>> It
>>>>>>>>>>>> seems
>>>>>>>>>>>> like we need to rework the fix for GEODE-5591.
>>>>>>>>>>>>
>>>>>>>>>>>> Does it make sense to hold up the release
to rework this fix,
>>>> or
>>>>>>> should
>>>>>>>>>>>> we
>>>>>>>>>>>> just revert it? Have we switched concourse
over to using alpine
>>>>>>> linux,
>>>>>>>>>>>> which I think was the original motivation
for this fix?
>>>>>>>>>>>>
>>>>>>>>>>>> -Dan
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith
<dsmith@pivotal.io>
>>>>>> wrote:
>>>>>>>>>>>> Why is it waiting at all in this case? Where
is this 2 minute
>>>>>> timeout
>>>>>>>>>>>>> coming from?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda
<
>>>>>>>>>>>>>
>>>>>>>>>>>> sai.boorlagadda@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> So the issue is that it takes longer
to start than previous
>>>>>>> releases?
>>>>>>>>>>>>>> Also, is this wait time only when
using Gfsh to create
>>>>>>>>>>>>>> gateway-receiver?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun
Nag <nnag@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>>>> Currently we have a minor issue in
the release branch as
>>>>> pointed
>>>>>>> out
>>>>>>>>>>>>>> by
>>>>>>>>>>>>> Barry O.
>>>>>>>>>>>>>>> We will wait till a resolution
is figured out for this
>>>> issue.
>>>>>>>>>>>>>>> Steps:
>>>>>>>>>>>>>>> 1. create locator
>>>>>>>>>>>>>>> 2. start server --name=server1
--server-port=40404
>>>>>>>>>>>>>>> 3. start server --name=server2
--server-port=40405
>>>>>>>>>>>>>>> 4. create gateway-receiver --member=server1
>>>>>>>>>>>>>>> 5. create gateway-receiver --member=server2
`This gets stuck
>>>>>> for 2
>>>>>>>>>>>>>> minutes`
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is the 2 minute wait time acceptable?
Should we document it?
>>>>>> When
>>>>>>> we
>>>>>>>>>>>>>> revert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GEODE-5591, this issue does not
happen.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Nabarun Nag
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>


Mime
View raw message