Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 78AC7200B6B for ; Thu, 25 Aug 2016 23:01:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 735B5160AA5; Thu, 25 Aug 2016 21:01:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6A85E160AA4 for ; Thu, 25 Aug 2016 23:01:17 +0200 (CEST) Received: (qmail 66254 invoked by uid 500); 25 Aug 2016 21:01:05 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 66238 invoked by uid 99); 25 Aug 2016 21:01:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2016 21:01:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 8689F180481 for ; Thu, 25 Aug 2016 21:01:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.878 X-Spam-Level: * X-Spam-Status: No, score=1.878 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=medquist.onmicrosoft.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id dSZ7nQ9bikfo for ; Thu, 25 Aug 2016 21:01:02 +0000 (UTC) Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on0077.outbound.protection.outlook.com [104.47.33.77]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id E038E5F30C for ; Thu, 25 Aug 2016 21:01:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=MEDQUIST.onmicrosoft.com; s=selector1-MEDQUIST-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=NgD3FL3SlvZadhX9ewBJ4bIa+/AnHfgRsqTclk86wnc=; b=dK6fmt3WqBoOn0nsCNgYy6oOm8suNEqjRXzUvxvxggWQ0FqwSs1PdB6Qvj5K5vrcUm0fUfcY7bIhpsLTQWDBJxlX3luE7Cl2CJSw7LKJtz4AkYH6HTgEVcvY6Sl3ciBBoGsnVeDGyDQKviuycAriXS7GUhai5KtkS5Vm5tQgYMk= Received: from CY1PR0601MB1305.namprd06.prod.outlook.com (10.161.215.24) by CY1PR0601MB1306.namprd06.prod.outlook.com (10.161.215.25) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.1.587.13; Thu, 25 Aug 2016 21:00:53 +0000 Received: from CY1PR0601MB1305.namprd06.prod.outlook.com ([10.161.215.24]) by CY1PR0601MB1305.namprd06.prod.outlook.com ([10.161.215.24]) with mapi id 15.01.0587.013; Thu, 25 Aug 2016 21:00:53 +0000 From: Jon Hawkesworth To: "solr-user@lucene.apache.org" Subject: solrcloud 6.0.1 any suggestions for fixing a replica that stubbornly remains down Thread-Topic: solrcloud 6.0.1 any suggestions for fixing a replica that stubbornly remains down Thread-Index: AdH/E8QkS8KwlfmvQ+6jHoMPm4NDhg== Date: Thu, 25 Aug 2016 21:00:53 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=jon.hawkesworth@MEDQUIST.onmicrosoft.com; x-originating-ip: [135.196.9.130] x-ms-office365-filtering-correlation-id: 3fcad545-2c35-4e9f-cb5a-08d3cd2ae6de x-microsoft-exchange-diagnostics: 1;CY1PR0601MB1306;6:wLtcRkWIeX8yXgJ9pK4x6aj3ch//jfzAsc7TyfAk0ns5E485PGB4tom0FkU0SDsNFQOt8N2UkT614OJQJt1xUYtlDPYb4j5RWfkD5yLvVFbHrvAtXtWc+19Iw4g6Ay51+o2O+LL/yDxf1jdXev/xJDQEqM1Cz8EFzR41OymtuhNBPbRXzFD2hojE3pNOTc1CHFOcCa843v+6nziMMaROE98AMF/1y4E86o/LlUQ0UZWureKq/F99Stu3S5Lw0tq26lWKiTUCjCJdJRO4sEcQz2gkeVwH7sMpxQ3PS3JTgq6wXUl8vMQ5g1QhtIZ4JvQpEGE50WUXIPnOQjfbCdCQNw==;5:C+jXVXEjiosFe+VMowIPqo2Eu3luTaJ/UGLaEPb+HYGeI6BUhyRCAVskQQUvXO3LkROx0bAXxBvtTi/m+efdt9tpAaeFSzNuO2FZRdLKFdqcWkGihrz+icClRl5s0xMrtYEY+1biuzwBDomiV8nb0A==;24:0CiReajhnEL1DvxUNoIrb9IaF4bB3nLp/5YtqLTZPEN2QMTuAT05JptgwX0GFC9Da/D9bv01TKR/mbjf/d35Ex6GQ9OPDjfMjE4oHERuqKI=;7:yIsT9yQLXvT5TUFt0HUkHtxBTMlHS9QvP64BKM1fQysZ2YHuJ/WX46VbxpqJqIWOkconMsRFV422VFqyfaXzMHAU4TsF+Ay2wb3VoGmvsIBbHE5UvOOFGWZXGYtPWoZ3zcI5VLxqSwBD/nCiYBQNHJUOZYHC+uOmScnO2ht2AN7TQTAGcs10V5PgUuh1ZSLHK+USoNc011tjyUCiV7+7LJTidrUkPVh0gQXuI3EfCgnjAc5gvbzN5p9zCmlrMkJo x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR0601MB1306; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(134217032509453)(21748063052155)(89646343598404); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(102415321)(6040176)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6055026);SRVR:CY1PR0601MB1306;BCL:0;PCL:0;RULEID:;SRVR:CY1PR0601MB1306; x-forefront-prvs: 0045236D47 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(7916002)(199003)(66654002)(189002)(586003)(81166006)(15975445007)(86362001)(81156014)(8936002)(3660700001)(2900100001)(7846002)(9686002)(5660300001)(8676002)(5002640100001)(7906003)(86442001)(7736002)(66066001)(7696003)(74316002)(64872007)(450100001)(68736007)(861006)(18717965001)(19300405004)(2501003)(19627595001)(99286002)(99936001)(105586002)(54356999)(101416001)(2906002)(50986999)(5890100001)(19625215002)(18206015028)(19580405001)(16236675004)(106356001)(19617315012)(33656002)(10400500002)(76576001)(97736004)(107886002)(122556002)(17760045003)(102836003)(6116002)(3846002)(229853001)(110136002)(11100500001)(92566002)(87936001)(86902001)(3280700002)(77096005)(189998001)(2351001)(19580395003)(7099028)(116253002);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1PR0601MB1306;H:CY1PR0601MB1305.namprd06.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:0;MX:1;LANG:en; received-spf: None (protection.outlook.com: MEDQUIST.onmicrosoft.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/related; boundary="_004_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_"; type="multipart/alternative" MIME-Version: 1.0 X-OriginatorOrg: MEDQUIST.onmicrosoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Aug 2016 21:00:53.3261 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 6b9e7226-0428-4f72-99c6-73ca70de4cd8 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0601MB1306 archived-at: Thu, 25 Aug 2016 21:01:28 -0000 --_004_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_ Content-Type: multipart/alternative; boundary="_000_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_" --_000_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Anyone got any suggestions how I can fix up my solrcloud 6.0.1 replica rema= ins down issue? Today we stopped all the loading and querying, brought down all 4 solr node= s, went into zookeeper and deleted everything under /collections/transcribe= dReports/leader_initiated_recovery/shard1/ and brought the cluster back up = (this seeming to be a reasonably similar situation to https://issues.apache= .org/jira/browse/SOLR-7021 where this workaround is described, albeit for a= n older version of solr. After a while things looked ok but when we attempted to move the second rep= lica back to the original node (by creating a third and then deleting the t= emp one which wasn't on the node we wanted it on), we immediately got a 'do= wn' status on the node (and its stayed that way ever since), with ' Could n= ot publish as ACTIVE after succesful recovery ' messages appearing in the l= ogs Its as if there is something specifically wrong with that node that stops u= s from ever having a functioning replica of shard1 on it. weird thing is shard2 on the same (problematic) node seems fine. Other stuff we have tried includes issuing a REQUESTRECOVERY moving from 2 to 4 nodes adding more replicas on other nodes (new replicas immediately go into down = state and stay that way). System is solrcloud 6.0.1 running on 4 nodes. There's 1 collection with 4 = shards and and I'm trying to have 2 replicas on each of the 4 nodes. Currently each shard is managing approx 1.2 million docs (mostly just text = 10-20k in size each usually). Any suggestions would be gratefully appreciated. Many thanks, Jon Jon Hawkesworth Software Developer [cid:image002.png@01D1FF1C.25E8DC80] Hanley Road, Malvern, WR13 6NP. UK O: +44 (0) 1684 312313 jon.hawkesworth@mmodal.com www.mmodal.com This electronic mail transmission contains confidential information intende= d only for the person(s) named. Any use, distribution, copying or disclosur= e by another person is strictly prohibited. If you are not the intended rec= ipient of this e-mail, promptly delete it and all attachments. --_000_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Anyone got any suggestions how I can fix up my solrc= loud 6.0.1 replica remains down issue? 

 

Today we stopped all the loading and querying, broug= ht down all 4 solr nodes, went into zookeeper and deleted everything under = /collections/transcribedReports/leader_initiated_recovery/shard1/ and broug= ht the cluster back up (this seeming to be a reasonably similar situation to https:/= /issues.apache.org/jira/browse/SOLR-7021 where this workaround is descr= ibed, albeit for an older version of solr.

 

After a while things looked ok but when we attempted= to move the second replica back to the original node (by creating a third = and then deleting the temp one which wasn't on the node we wanted it on), w= e immediately got a 'down' status on the node (and its stayed that way ever since), with ' Could not publish as ACTIVE after succesful recovery ' messages appearing in the logs

 

Its as if there is something specifically wrong with= that node that stops us from ever having a functioning replica of shard1 o= n it.

 

weird thing is shard2 on the same (problematic) node= seems fine.

 

Other stuff we have tried includes

 

issuing a REQUESTRECOVERY

moving from 2 to 4 nodes

adding more replicas on other nodes (new replicas im= mediately go into down state and stay that way).

 

System is solrcloud 6.0.1 running on 4 nodes.  = There's 1 collection with 4 shards and and I'm trying to have 2 replicas on= each of the 4 nodes.

Currently each shard is managing approx 1.2 million = docs (mostly just text 10-20k in size each usually).

 

Any suggestions would be gratefully appreciated.

 

Many thanks,

 

Jon

 

 

Jon Hawkesworth
Software Developer

 

 

Hanley Road, Malvern, WR13 6NP. UK

O: +44 (0) 1684 312313
jon.hawkesworth@mmodal.com
www.mmodal.= com

 

This electronic = mail transmission contains confidential information intended only for the p= erson(s) named. Any use, distribution, copying or disclosure by another per= son is strictly prohibited. If you are not the intended recipient of this e-mail, promptly delete it and all atta= chments.

 

--_000_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_-- --_004_CY1PR0601MB1305DA7553FE814FFF24DDB5D8ED0CY1PR0601MB1305_--