From dev-return-29618-archive-asf-public=cust-asf.ponee.io@geode.apache.org Wed Sep 5 19:55:44 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CB4CB180654 for ; Wed, 5 Sep 2018 19:55:43 +0200 (CEST) Received: (qmail 26535 invoked by uid 500); 5 Sep 2018 17:55:42 -0000 Mailing-List: contact dev-help@geode.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.apache.org Delivered-To: mailing list dev@geode.apache.org Received: (qmail 26523 invoked by uid 99); 5 Sep 2018 17:55:42 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2018 17:55:42 +0000 Received: from MacBook-Pro-4.local (50-203-225-134-static.hfc.comcastbusiness.net [50.203.225.134]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id BF5A9D27 for ; Wed, 5 Sep 2018 17:55:41 +0000 (UTC) Subject: Re: 2 minute gateway startup time due to GEODE-5591 To: dev@geode.apache.org References: <382ffe87-c43c-a1cf-a953-62a9e5079855@pivotal.io> <15E103D6-7C80-4A26-92BD-FC9FC0679C9D@pivotal.io> <9c826a21-85de-743a-df34-bfe7e5cdd409@apache.org> From: Udo Kohlmeyer Message-ID: <09b0a38e-4459-f2c2-a511-4f7d95472605@apache.org> Date: Wed, 5 Sep 2018 10:55:40 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Thank you. I must have missed that :) On 9/5/18 10:54, Nabarun Nag wrote: > @Udo I have mentioned in an earlier mail that it will be reverted in > develop and then cherry picked to develop. release/1.7.0 branch has not > being published yet, as it is undergoing preliminary tests before release > candidate is published. > > Regards > Nabarun Nag > > On Wed, Sep 5, 2018 at 10:46 AM Udo Kohlmeyer wrote: > >> Did we also revert this in 1.7? I assume it has, but not directly stated >> here. >> >> >> On 9/5/18 10:20, Nabarun Nag wrote: >>> GEODE-5591 has been reverted in develop >>> ref: 901da27f227a8ce2b7d6b681619782a1accd9330 >>> >>> Regards >>> Nabarun Nag >>> >>> On Wed, Sep 5, 2018 at 10:14 AM Ryan McMahon >> wrote: >>>> +1 for reverting in both places. >>>> >>>> I see that there is already an isGatewayReceiver flag in the >> AcceptorImpl >>>> constructor. It's not ideal, but could we use this flag to prevent the >> 2 >>>> minute retry logic for happening if this flag is true? >>>> >>>> Ryan >>>> >>>> On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey < >>>> lhughesgodfrey@pivotal.io> wrote: >>>> >>>>> +1 for reverting in both places. >>>>> >>>>> On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith wrote: >>>>> >>>>>> +1 for reverting in both places. The current fix is not better, that's >>>>> why >>>>>> we are reverting it on the release branch! >>>>>> >>>>>> -Dan >>>>>> >>>>>> On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett >>>>> wrote: >>>>>>> I’m not ok with reverting in develop. Revert in 1.7 and modify in >>>>>> develop. >>>>>>> We shouldn’t go backwards in develop. The current fix is better than >>>>> the >>>>>>> bug it fixes. >>>>>>> >>>>>>>> On Sep 5, 2018, at 9:40 AM, Nabarun Nag wrote: >>>>>>>> >>>>>>>> If everyone is okay with it, I will revert that change in develop >>>> and >>>>>>> then >>>>>>>> cherry pick it to release/1.7.0 branch. >>>>>>>> Please do comment. >>>>>>>> >>>>>>>> Regards >>>>>>>> Nabarun Nag >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith >>>> wrote: >>>>>>>>> +1 to yank it and rework the fix. >>>>>>>>> >>>>>>>>> Gester's change helps, but it just means that you will sometimes >>>>>>> randomly >>>>>>>>> have a 2 minute delay starting up a gateway receiver. I don't >>>> think >>>>>>> that is >>>>>>>>> a great user experience either. >>>>>>>>> >>>>>>>>> -Dan >>>>>>>>> >>>>>>>>> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt < >>>>>>> bschuchardt@pivotal.io> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Let's yank it >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On 9/4/18 5:04 PM, Sean Goller wrote: >>>>>>>>>>> >>>>>>>>>>> If it's to get the release out, I'm fine with reverting. I don't >>>>>> like >>>>>>>>> it, >>>>>>>>>>> but I'm not willing to die on that hill. :) >>>>>>>>>>> >>>>>>>>>>> -S. >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith >>>>> wrote: >>>>>>>>>>> Spitting this into a separate thread. >>>>>>>>>>>> I see the issue. The two minute timeout is the constructor for >>>>>>>>>>>> AcceptorImpl, where it retries to bind for 2 minutes. >>>>>>>>>>>> >>>>>>>>>>>> That behavior makes sense for CacheServer.start. >>>>>>>>>>>> >>>>>>>>>>>> But it doesn't make sense for the new logic in >>>>>>> GatewayReceiver.start() >>>>>>>>>>>> from >>>>>>>>>>>> GEODE-5591. That code is trying to use CacheServer.start to >>>> scan >>>>>> for >>>>>>> an >>>>>>>>>>>> available port, trying each port in a range. That free port >>>>> finding >>>>>>>>> logic >>>>>>>>>>>> really doesn't want to have two minutes of retries for each >>>> port. >>>>>> It >>>>>>>>>>>> seems >>>>>>>>>>>> like we need to rework the fix for GEODE-5591. >>>>>>>>>>>> >>>>>>>>>>>> Does it make sense to hold up the release to rework this fix, >>>> or >>>>>>> should >>>>>>>>>>>> we >>>>>>>>>>>> just revert it? Have we switched concourse over to using alpine >>>>>>> linux, >>>>>>>>>>>> which I think was the original motivation for this fix? >>>>>>>>>>>> >>>>>>>>>>>> -Dan >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith >>>>>> wrote: >>>>>>>>>>>> Why is it waiting at all in this case? Where is this 2 minute >>>>>> timeout >>>>>>>>>>>>> coming from? >>>>>>>>>>>>> >>>>>>>>>>>>> -Dan >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda < >>>>>>>>>>>>> >>>>>>>>>>>> sai.boorlagadda@gmail.com >>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> So the issue is that it takes longer to start than previous >>>>>>> releases? >>>>>>>>>>>>>> Also, is this wait time only when using Gfsh to create >>>>>>>>>>>>>> gateway-receiver? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag >>>>>>> wrote: >>>>>>>>>>>>>> Currently we have a minor issue in the release branch as >>>>> pointed >>>>>>> out >>>>>>>>>>>>>> by >>>>>>>>>>>>> Barry O. >>>>>>>>>>>>>>> We will wait till a resolution is figured out for this >>>> issue. >>>>>>>>>>>>>>> Steps: >>>>>>>>>>>>>>> 1. create locator >>>>>>>>>>>>>>> 2. start server --name=server1 --server-port=40404 >>>>>>>>>>>>>>> 3. start server --name=server2 --server-port=40405 >>>>>>>>>>>>>>> 4. create gateway-receiver --member=server1 >>>>>>>>>>>>>>> 5. create gateway-receiver --member=server2 `This gets stuck >>>>>> for 2 >>>>>>>>>>>>>> minutes` >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is the 2 minute wait time acceptable? Should we document it? >>>>>> When >>>>>>> we >>>>>>>>>>>>>> revert >>>>>>>>>>>>>> >>>>>>>>>>>>>>> GEODE-5591, this issue does not happen. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> Nabarun Nag >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>