lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <hendrik.hadd...@gmx.net>
Subject Re: solr shutdown
Date Sat, 22 Oct 2016 13:57:04 GMT
thanks, I assume there is some issue on my side as I actually did not 
find any of the messages that the Solr script would log out during the 
shutdown. The shutdown also happened much faster then the 5 second delay 
in the script. So I'm doing something wrong. Anyhow, thanks for the 
further details, should give me enough to investigate further.

On 22.10.2016 15:22, Erick Erickson wrote:
> bq:  Would a clean shutdown result in the node to be flagged as down
> in the cluster state straight away?
>
> It should, if it's truly clean. HOWEVER..... a "clean shutdown" is
> unfortunately not just a "bin/solr stop" because of the timeout Shawn
> mentioned, see SOLR-9371. It's a simple edit to make it much longer,
> but the real fix should poll. The "smoking gun" would be a correlation
> between the node not being marked as down in state.json and a message
> when you stop the instance with bin/solr about "forcefully killing
> ....."
>
> After only 5 seconds, that script forcefully kills the instance of
> Solr which would _not_ flag the replicas it hosts as down. After an
> interval, you should see it disappear from the "live nodes" znode
> though. The problem of course is that part of graceful shutdown is
> each replica updating the associated state.json, and they don't get a
> chance. ZK will periodically ping the Solr instance and if it times
> out remove the associated znode in "live nodes"....
>
> Solr code checks both the state.json and live_nodes to know whether a
> node is truly functioning, being absent from live_nodes trumps
> whatever state is in state.json.
>
> Best,
> Erick
>
>
>
>
> On Sat, Oct 22, 2016 at 1:00 AM, Hendrik Haddorp
> <hendrik.haddorp@gmx.net> wrote:
>> Thanks, that was what I was hoping for I just didn't see any indication for
>> that in the normal log output.
>>
>> The reason for asking is that I have a SolrCloud 6.2.1 setup and when ripple
>> restarting the nodes I sometimes get errors. So far I have seen two
>> different things:
>> 1) The node starts up again and is able to receive new replicas but all
>> existing replicas are broken.
>> 2) All nodes come up and no problems are seen in the cluster status but the
>> admin UI on one node claims that a file for one config set is missing.
>> Restarting the node resolves the issue.
>>
>> This looked to me like the node is not going down cleanly. Would a clean
>> shutdown result in the node to be flagged as down in the cluster state
>> straight away? So far the ZooKeeper data gets only updated once the node
>> comes up again and reports itself as down before the recovery starts.
>>
>> On 21.10.2016 15:01, Shawn Heisey wrote:
>>> On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
>>>> I'm running solrcloud in foreground mode (-f). Does it make a
>>>> difference for Solr if I stop it by pressing ctrl-c, sending it a
>>>> SIGTERM or using "solr stop"?
>>> All of those should produce the same result in the end -- Solr's
>>> shutdown hook will be called and a graceful shutdown will commence.
>>>
>>> Note that in the case of the "bin/solr stop" command, the default is to
>>> only wait five seconds for graceful shutdown before proceeding to a
>>> forced kill, which for a typical install, means that forced kills become
>>> the norm rather than the exception.  We have an issue to increase the
>>> max timeout, but it hasn't been done yet.
>>>
>>> I strongly recommend anyone going into production should edit the script
>>> to increase the timeout.  For the shell script I would do at least 60
>>> seconds.  The Windows script just does a pause, not an intelligent wait,
>>> so going that high probably isn't advisable on Windows.
>>>
>>> Thanks,
>>> Shawn
>>>


Mime
View raw message