lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Dead node, but clusterstate.json says active, won't sync on restart
Date Wed, 29 Jan 2014 17:08:41 GMT
What's in the logs of the node that won't recover on restart after clearing the index and tlog


- Mark

On Jan 29, 2014, at 11:41 AM, Greg Preston <gpreston@marinsoftware.com> wrote:

>> If you removed the tlog and index and restart it should resync, or
> something is really crazy.
> 
> It doesn't, or at least if it tries, it's somehow failing.  I'd be ok with
> the sync failing for some reason if the node wasn't also serving queries.
> 
> 
> -Greg
> 
> 
>> On Tue, Jan 28, 2014 at 11:10 AM, Mark Miller <markrmiller@gmail.com> wrote:
>> 
>> Sounds like a bug. 4.6.1 is out any minute - you might try that. There was
>> a replication bug that may be involved.
>> 
>> If you removed the tlog and index and restart it should resync, or
>> something is really crazy.
>> 
>> The clusterstate.json is a red herring. You have to merge the live nodes
>> info with the state to know the real state.
>> 
>> - Mark
>> 
>> http://www.about.me/markrmiller
>> 
>>>> On Jan 28, 2014, at 12:31 PM, Greg Preston <gpreston@marinsoftware.com>
>>> wrote:
>>> 
>>> ** Using solrcloud 4.4.0 **
>>> 
>>> I had to kill a running solrcloud node.  There is still a replica for
>> that
>>> shard, so everything is functional.  We've done some indexing while the
>>> node was killed.
>>> 
>>> I'd like to bring back up the downed node and have it resync from the
>> other
>>> replica.  But when I restart the downed node, it joins back up as active
>>> immediately, and doesn't resync.  I even wiped the data directory on the
>>> downed node, hoping that would force it to sync on restart, but it
>> doesn't.
>>> 
>>> I'm assuming this is related to the state still being listed as active in
>>> clusterstate.json for the downed node?  Since it comes back as active,
>> it's
>>> serving queries and giving old results.
>>> 
>>> How can I force this node to do a recovery on restart?
>>> 
>>> Thanks.
>>> 
>>> 
>>> -Greg
>> 

Mime
View raw message