geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hanson <mhan...@pivotal.io>
Subject Re: [Proposal] Make gfsh "stop server" command synchronous
Date Wed, 11 Sep 2019 21:24:09 GMT
Good question. I will have to look into that.

Thanks,
Mark

> On Sep 11, 2019, at 10:53 AM, Dan Smith <dsmith@pivotal.io> wrote:
> 
>> The idea I am working with at the moment that Kirk pointed me at was to
> use the pid file in the directory as indicator. Once that file disappears
> the server is stopped.
> 
> How will this work if stop server --member is invoked some a different
> machine than the member that is being stopped?
> 
> -Dan
> 
> On Wed, Sep 11, 2019 at 10:28 AM Mark Hanson <mhanson@pivotal.io> wrote:
> 
>> The idea I am working with at the moment that Kirk pointed me at was to
>> use the pid file in the directory as indicator. Once that file disappears
>> the server is stopped.
>> 
>> That seems to work in my testing.
>> 
>> Thoughts?
>> 
>> Thanks,
>> Mark
>> 
>>> On Sep 11, 2019, at 10:23 AM, Dan Smith <dsmith@pivotal.io> wrote:
>>> 
>>> It does seem like we should make stop synchronous, or at least make start
>>> wait for the old process to die as Bruce suggested. Otherwise it is
>>> difficult for someone to script the restart of a server.
>>> 
>>> Looking at the code, it does look like gfsh stop is asynchronous. There
>> are
>>> multiple ways to stop a server:
>>> * gfsh stop --dir - it looks like we write out some stop file and return
>>> immediately. Or, if we can connect over JMX, we invoke the
>>> MemberMBean.shutDownMember method, which launches a thread to close the
>>> cache, which is also asynchronous.
>>> * gfsh stop --pid - this seems to be similar to --dir
>>> * With a member name - this appears to go to the
>> MemberMBean.shutDownMember
>>> method as well.
>>> 
>>> I think one issue is that the JMX methods to stopping the server may be
>>> hard to ensure the process is really gone, because they can be invoked
>>> remotely. That may be why they are asynchronous - they need to return
>>> something to the caller before shutting down. So maybe Bruce's suggestion
>>> is better.
>>> 
>>> As Jens pointed out - tests should generally just use port 0 for servers.
>>> 
>>> -Dan
>>> 
>>> On Wed, Sep 11, 2019 at 8:46 AM Jens Deppe <jensdeppe@apache.org> wrote:
>>> 
>>>> To circle back to the original test failure that prompted this
>> discussion -
>>>> the failing test was getting intermittent bind exceptions on subsequent
>>>> server restarts.
>>>> 
>>>> I believe it's quite likely that a process' ports will remain
>> unavailable
>>>> even after it is gone (I'm not sure if we create listening sockets with
>>>> SO_REUSEADDR). So, as to John's comment that gfsh is already
>> synchronous, I
>>>> don't think that adding extra functionality to gfsh, to ultimately just
>>>> wait longer before exiting, is really solving the problem. I'd suggest
>> you
>>>> adjust the tests to always start servers with `--server-port=0` so that
>>>> there are no port conflicts and let the OS handle it.
>>>> 
>>>> --Jens
>>>> 
>>>> On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt <
>> bschuchardt@pivotal.io>
>>>> wrote:
>>>> 
>>>>> Blocking or non-blocking, I don't have a strong opinion.  What I'd
>>>>> really like to have gfsh ensure, though, is that no-one is able to
>> start
>>>>> a new instance of a server while the old process is still around.
>> Maybe
>>>>> the PID file is the way to do that.
>>>>> 
>>>>> On 9/10/19 3:08 PM, Mark Hanson wrote:
>>>>>> Hello All,
>>>>>> 
>>>>>> I would like to propose that we make the gfsh “stop server” command
>>>>> synchronous. It is causing some issues with some tests as the rest of
>> the
>>>>> calls are blocking. Stop on the other hand immediately returns by
>>>>> comparison.
>>>>>> This causes issues as shown in GEODE-7017 specifically.
>>>>>> 
>>>>>> GEODE:7017 CI failure:
>>>>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest
>
>>>>> startupReportsOnlineOnlyAfterRedundancyRestored
>>>>>> https://issues.apache.org/jira/browse/GEODE-7017 <
>>>>> https://issues.apache.org/jira/browse/GEODE-7017>
>>>>>> 
>>>>>> 
>>>>>> What do people think?
>>>>>> 
>>>>>> Thanks,
>>>>>> Mark
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message