Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Date: Wed, 27 Dec 2017 09:46:00 +0000 (UTC)
From: "Samuel Tatipamula (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Message-ID: <JIRA.13127222.1514302681000.535766.1514367960085@Atlassian.JIRA>
In-Reply-To: <JIRA.13127222.1514302681000@Atlassian.JIRA>
References: <JIRA.13127222.1514302681000@Atlassian.JIRA> <JIRA.13127222.1514302681571@jira-lw-us.apache.org>
Subject: [jira] [Commented] (SOLR-11794) PULL replicas stop replicating
 after schema push and RELOAD collection action
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 27 Dec 2017 09:46:05 -0000


    [ https://issues.apache.org/jira/browse/SOLR-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304397#comment-16304397 ] 

Samuel Tatipamula commented on SOLR-11794:
------------------------------------------

I have tried just hitting the RELOAD collection API without making any changes to any config files, and able to replicate the same issue.
Able to replicate the issue in 7.2 also.
Able to replicate the issue even if the schema change is added via /schema (v2) api, which is predictable given the API internally calls the reload collection API on all nodes.

Really surprised that nobody else has noticed this issue till now.

> PULL replicas stop replicating after schema push and RELOAD collection action
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-11794
>                 URL: https://issues.apache.org/jira/browse/SOLR-11794
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: replication (java), Schema and Analysis, SolrCloud, update
>    Affects Versions: 7.1, 7.2
>         Environment: Linux version 2.6.32-642.15.1.el6.x86_64 (mockbuild@c1bm.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Fri Feb 24 14:31:22 UTC 2017
>            Reporter: Samuel Tatipamula
>            Priority: Critical
>              Labels: patch
>
> h3. *UPDATE*
> PULL replica replication stops after calling the RELOAD collection API, even without any config/schema changes!
> It's also happening when schema API is used to add a new field.
> An operating SolrCloud with NRT, TLOG, and PULL replicas.
> Solr - 7.1.0
> ZK - 3.4.10
> Used config set - sample_techproducts_configs
> Shards - 1
> Whenever a schema change (adding of new fields/changing field types) is pushed to ZK and the collection is reloaded using
> /solr/admin/collections?action=RELOAD&name=sample, the index changes stop replicating to PULL replicas. NRT and TLOG are able to replicate the index.
> Before the schema change, I can see the indexFetcher thread running on PULL replica
> 2017-12-26 10:17:11.802 INFO  (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Master's generation: 2
> 2017-12-26 10:17:11.802 INFO  (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Master's version: 1514283298419
> 2017-12-26 10:17:11.802 INFO  (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Slave's generation: 2
> 2017-12-26 10:17:11.802 INFO  (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Slave's version: 1514283298419
> 2017-12-26 10:17:11.802 INFO  (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Slave in sync with master.
> After that, the following change in schema that is made to managed-schema of sample_techproducts_configs, pushed to ZK, and collection reloaded.
> <field name="testpoint_1" type="point" indexed="true" stored="true"/>
> <field name="testpoint_2" type="point" indexed="true" stored="true"/>
> <field name="testpoint_3" type="point" indexed="true" stored="true"/>
> I can no longer see IndexFetcher thread running on PULL replica. No logs are printed. The logs end with the collection reload log
> 2017-12-26 10:22:09.256 INFO  (qtp128526626-16) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={core=sample_shard1_replica_p5&qt=/admin/cores&action=RELOAD&wt=javabin&version=2} status=0 QTime=624
> The index is never modified after this, and leader doesn't get the polls from the PULL replica.
> Observations:
> - Manually forcing an index fetch using /replication?command=fetchindex syncs the index, but doesn't start the IndexFetcher polling.
> - Restarting the replica will sync the index, starts IndexFetcher thread and polling.
> - Removing and adding the replica back as PULL will sync the index, starts IndexFetcher thread and polling.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org