Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EA46E200D69 for ; Wed, 27 Dec 2017 10:46:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E8C82160C23; Wed, 27 Dec 2017 09:46:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 39E64160C20 for ; Wed, 27 Dec 2017 10:46:04 +0100 (CET) Received: (qmail 85906 invoked by uid 500); 27 Dec 2017 09:46:02 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 85896 invoked by uid 99); 27 Dec 2017 09:46:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Dec 2017 09:46:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6DD74180414 for ; Wed, 27 Dec 2017 09:46:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Aj5T75pbPEdQ for ; Wed, 27 Dec 2017 09:46:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id D439A5F3CE for ; Wed, 27 Dec 2017 09:46:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5A0F5E04F4 for ; Wed, 27 Dec 2017 09:46:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 15C0B212F8 for ; Wed, 27 Dec 2017 09:46:00 +0000 (UTC) Date: Wed, 27 Dec 2017 09:46:00 +0000 (UTC) From: "Samuel Tatipamula (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SOLR-11794) PULL replicas stop replicating after schema push and RELOAD collection action MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 27 Dec 2017 09:46:05 -0000 [ https://issues.apache.org/jira/browse/SOLR-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304397#comment-16304397 ] Samuel Tatipamula commented on SOLR-11794: ------------------------------------------ I have tried just hitting the RELOAD collection API without making any changes to any config files, and able to replicate the same issue. Able to replicate the issue in 7.2 also. Able to replicate the issue even if the schema change is added via /schema (v2) api, which is predictable given the API internally calls the reload collection API on all nodes. Really surprised that nobody else has noticed this issue till now. > PULL replicas stop replicating after schema push and RELOAD collection action > ----------------------------------------------------------------------------- > > Key: SOLR-11794 > URL: https://issues.apache.org/jira/browse/SOLR-11794 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java), Schema and Analysis, SolrCloud, update > Affects Versions: 7.1, 7.2 > Environment: Linux version 2.6.32-642.15.1.el6.x86_64 (mockbuild@c1bm.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Fri Feb 24 14:31:22 UTC 2017 > Reporter: Samuel Tatipamula > Priority: Critical > Labels: patch > > h3. *UPDATE* > PULL replica replication stops after calling the RELOAD collection API, even without any config/schema changes! > It's also happening when schema API is used to add a new field. > An operating SolrCloud with NRT, TLOG, and PULL replicas. > Solr - 7.1.0 > ZK - 3.4.10 > Used config set - sample_techproducts_configs > Shards - 1 > Whenever a schema change (adding of new fields/changing field types) is pushed to ZK and the collection is reloaded using > /solr/admin/collections?action=RELOAD&name=sample, the index changes stop replicating to PULL replicas. NRT and TLOG are able to replicate the index. > Before the schema change, I can see the indexFetcher thread running on PULL replica > 2017-12-26 10:17:11.802 INFO (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Master's generation: 2 > 2017-12-26 10:17:11.802 INFO (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Master's version: 1514283298419 > 2017-12-26 10:17:11.802 INFO (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Slave's generation: 2 > 2017-12-26 10:17:11.802 INFO (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Slave's version: 1514283298419 > 2017-12-26 10:17:11.802 INFO (indexFetcher-14-thread-1) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.h.IndexFetcher Slave in sync with master. > After that, the following change in schema that is made to managed-schema of sample_techproducts_configs, pushed to ZK, and collection reloaded. > > > > I can no longer see IndexFetcher thread running on PULL replica. No logs are printed. The logs end with the collection reload log > 2017-12-26 10:22:09.256 INFO (qtp128526626-16) [c:sample s:shard1 r:core_node6 x:sample_shard1_replica_p5] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={core=sample_shard1_replica_p5&qt=/admin/cores&action=RELOAD&wt=javabin&version=2} status=0 QTime=624 > The index is never modified after this, and leader doesn't get the polls from the PULL replica. > Observations: > - Manually forcing an index fetch using /replication?command=fetchindex syncs the index, but doesn't start the IndexFetcher polling. > - Restarting the replica will sync the index, starts IndexFetcher thread and polling. > - Removing and adding the replica back as PULL will sync the index, starts IndexFetcher thread and polling. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org