lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai 'wusel' Siering" <>
Subject How to recover from failed SPLITSHARD?
Date Thu, 28 Sep 2017 23:11:11 GMT

this is with SolrCloud 6.5.1 on Ubuntu LTS 16.04 and OpenJDK 8, 4 Solr in Cloud mode, external

I tried to split my colection's shard1 (500 GB) with SPLITSHARD, it kind of worked. After
more than 8 hours the new shards left "construction" state — and entered "recovery" :( Another
about 12 hours later, Out of Memory errors with "could not create thread" happened. Node
took leadership of shard1, but since we still saw errors on searches, I stopped solr on,
changed heap from 24G to 31G and rebooted the system, just in case — good time to install
latest patches. came back and shards shard1, shard1_0 and shard1_1 started recovery.
But unfortunately,, leader for shard2 which was being split as well, hit "something":
solr.log got not updated anymore, the UI didn't work anymore, so in the end, I stopped solr
there as well (finished instantly) and rebootet. Now both are running with 31G java heap,
shard1 and shard2 are synced and I try to clean up before retrying.

Of shard2, only a shard2_0 without any replicas was left over, and DELETESHARD clean it up.

But shard1 has shard1_0 and shard1_1, each with two replicas. DELETESHARD errored out, so
I DELETEREPLICA all of them. This worked, but "parts of" shard1_0 and shard1_1 are still there
and I cannot delete them:

$ wget -q -O - ''
| jq
          "shard1_0": {
            "range": "80000000-bfffffff",
            "state": "recovery_failed",
            "replicas": {}
          "shard1_1": {
            "parent": "shard1",
            "shard_parent_node": "",
            "range": "c0000000-ffffffff",
            "state": "recovery_failed",
            "shard_parent_zk_session": "98682039611162624",
            "replicas": {}

$ wget -O - ''
--2017-09-29 01:01:16--
Connecting to connected.
HTTP request sent, awaiting response... 400 Bad Request
2017-09-29 01:01:16 ERROR 400: Bad Request.

Any hint on how to fix this appreciated ;)


View raw message