lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pushkar Raste (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9446) Just replicated index goes into replication recovery on leader failure even if index was not changed
Date Mon, 12 Sep 2016 14:10:20 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484221#comment-15484221
] 

Pushkar Raste commented on SOLR-9446:
-------------------------------------

When I ran the test a couple of times, I did see that even fresh replicated index could become
the leader. I do think that check is unnecessary for the test, irrespective of which node
becomes the leader, we should never go into replication if index was unchanged.

> Just replicated index goes into replication recovery on leader failure even if index
was not changed
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9446
>                 URL: https://issues.apache.org/jira/browse/SOLR-9446
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: replication (java)
>            Reporter: Pushkar Raste
>            Assignee: Noble Paul
>            Priority: Minor
>
>  We noticed this issue while migrating solr index from machines {{A1, A2 and A3}} to
{{B1, B2, B3}}. We followed following steps (and there were no updates during the migration
process).
> * Index had replicas on machines {{A1, A2, A3}}. Let's say {{A1}} was the leader at the
time
> * We added 3 more replicas {{B1, B2 and B3}}. These nodes synced with the by replication.
These fresh nodes do not have tlogs.
> * We shut down one of the old nodes ({{A3}}). 
> * We then shut down the leader ({{A1}})
> * New leader got elected (let's say {{A2}}) became the new leader
> * Leader asked all the replicas to sync with it
> * Fresh nodes (ones without tlogs), first tried PeerSync but since there was no frame
of reference, PeerSync failed and fresh nodes fail back on to try replication 
> Although replication would not copy all the segments again, it seems like we can short
circuit sync to put nodes back in active state as soon as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message