lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Oconnor <bocon...@plos.org>
Subject Replicates not recovering after rolling restart
Date Wed, 20 Sep 2017 20:42:54 GMT
Hello,


Background:


We have been successfully using Solr for over 5 years and we recently made the decision to
move into SolrCloud. For the most part that has been easy but we have repeated problems with
our rolling restart were server remain functional but stay in Recovery until they stop trying.
We restarted because we increased the memory from 12GB to 16GB on the JVM.


Does anyone have any insight as to what is going on here?

Is there a special procedure I should use for starting a stopping host?

Is it ok to do a rolling restart on all the nodes in s shard?


Any insight would be appreciated.


Configuration:


We have a group of servers with multiple collections. Each collection consist of one shard
and multiple replicates. We are running the latest stable version of SolrClound 6.6 on Ubuntu
LTS and Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_66 25.66-b17


(collection)              (shard)          (replicates)

journals_stage   ->  shard1  ->  solr-220 (leader) , solr-223, solr-221, solr-222 (replicates)


Problem:


Restarting the system puts the replicates in a recovery state they never exit from. They eventually
give up after 500 tries.  If I go to the individual replicates and execute a query the data
is still available.


Using tcpdump I find the replicates sending this request to the leader (the leader appears
to be active).


The exchange goes  like this - :


solr-220 is the leader.

Solr-221 to Solr-220


10:18:42.426823 IP solr-221:54341 > solr-220:8983:


POST /solr/journals_stage_shard1_replica1/update HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
User-Agent: Solr[org.apache.solr<http://org.apache.solr/>.client.solrj.impl<http://client.solrj.impl/>.HttpSolrClient]
1.0
Content-Length: 108
Host: solr-220:8983
Connection: Keep-Alive


commit_end_point=true&openSearcher=false&commit=true&softCommit=false&waitSearcher=true&wt=javabin&version=2


Solr-220 back to Solr-221


IP solr-220:8983 > solr-221:54341: Flags [P.], seq 1:5152, ack 385, win 235, options [nop,nop,
TS val 858155553 ecr 858107069], length 5151
..HTTP/1.1 500 Server Error
Content-Type: application/octet-stream
Content-Length: 5060


.responseHeader..&statusT..%QTimeC.%error..#msg?.For input string: "1578578283947098112".%trace?.&java.lang.NumberFormatException:
For
input string: "1578578283947098112"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:583)
        at java.lang.Integer.parseInt(Integer.java:615)
        at org.apache.lucene.queries.function.docvalues.IntDocValues.getRangeScorer(IntDocValues.java:89)
        at org.apache.solr<http://org.apache.solr/>.search.function.ValueSourceRangeFilter$1.iterator(ValueSourceRangeFilter.java:83)
        at org.apache.solr<http://org.apache.solr/>.search.SolrConstantScoreQuery$ConstantWeight.scorer(SolrConstantScoreQuery.java:100)
        at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
        at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
        at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
        at org.apache.solr<http://org.apache.solr/>.update.DeleteByQueryWrapper$1.scorer(DeleteByQueryWrapper.java:90)
        at org.apache.lucene.index.BufferedUpdatesStream.applyQueryDeletes(BufferedUpdatesStream.java:709)

        at org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:267)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message