lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrCloud recovery
Date Fri, 25 Jan 2019 15:51:08 GMT
That's just _loading_, recovery happens later so I'd
be surprised if this really made a difference, but you
never know.

I'm more interested in _why_ recovery takes so long.
and why recovery happens in the first place. It's normal
for replicas when starting up to to from down->recovering->active,
that's just part of the normal cycle. But the recovering state
should be relatively short absent having to replicate the
index from the leader.

If active indexing is going on, then the replicas may have to
copy their index down from the leader. Does this happen
on a system that is not indexing?

What version of Solr? All the state changes go through
the Overseer, and there were some very significant improvements
in Solr 6.6+, see:
https://issues.apache.org/jira/browse/SOLR-10265

And can you put a number to "rather long"? There's a built-in
3 minute wait for leader election if there's no leader for
a slice. That's not relevant if the replica in recovery
belongs to a shard that already has a leader, but if you
restart your entire cluster it can come into play.

Best,
Erick

On Fri, Jan 25, 2019 at 3:32 AM Hendrik Haddorp <hendrik.haddorp@gmx.net> wrote:
>
> Thanks, that sounds good. Didn't know that parameter.
>
> On 25.01.2019 11:23, Vadim Ivanov wrote:
> >   You can try to tweak solr.xml
> >
> >
> > coreLoadThreads
> > Specifies the number of threads that will be assigned to load cores in parallel.
> >
> > https://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html
> >
> >>> -----Original Message-----
> >>> From: Hendrik Haddorp [mailto:hendrik.haddorp@gmx.net]
> >>> Sent: Friday, January 25, 2019 11:39 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: SolrCloud recovery
> >>>
> >>> Hi,
> >>>
> >>> I have a SolrCloud with many collections. When I restart an instance and
> >>> the replicas are recovering I noticed that number replicas recovering at
> >>> one point is usually around 5. This results in the recovery to take
> >>> rather long. Is there a configuration option that controls how many
> >>> replicas can recover in parallel?
> >>>
> >>> thanks,
> >>> Hendrik
>

Mime
View raw message