lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Inconsistent replicas in a shard
Date Fri, 06 Oct 2017 19:26:34 GMT
Shouldn't be happening of course (replicas with different numbers of
docs), at least permanently. It can regularly happen on a _temporary_
basis however. And there are ways you can cause this to happen
permanently. Here's an outline.

> temporarily out of sync. Due to the fact that commits happen at different wall clock
times, different replicas in the same shard can be skewed for the autocommit interval. Ways
to check:
>> stop indexing, wait for the CDCR to catch up _plus_ your autocommit interval and
check.
>> Fire a query at the replica that cuts off some time in the past and add distrib=false,
then examine the number of hits returned. The query looks something like "..solr/collection1_shard1_replica1/query?q=*:*&fq=timestamp:[*
TO NOW-(2x autocommit interval + CDCR latency)]&distrib=false". This requires a reliable
timestamp of course.

> Permanently out of sync:
>> if you ever fired a FORCELEADER at a replica, you are risking this.
>> If you stopped the (non leader) replica and kept indexing, then stopped the leader
and started the replica back up. Solr does the best it can to preserve the data, but if a
replica is offline it doesn't have updates in the tlog to replay. So when leader election
happens if the old replica is elected leader it won't have all the updates.


Best,
Erick

On Fri, Oct 6, 2017 at 12:04 PM, Webster Homer <webster.homer@sial.com> wrote:
> We are using Solr 6.2.0 in solrcloud mode
>
> I have a QA solrcloud that has multiple collections. All collections have 2
> shards each with two replicas.
>
> I have several replicas where the numDocs in the same shard do not match.
> In two collections with three different shards I have one replica with data
> and the other has no data. All six replicas appear healthy in the Solr
> console.
>
> So how does that happen where two replicas in the same shard have different
> amounts of data?
>
> How do you diagnose this when the replicas are active and seemingly healthy?
>
> How do I get the replicas with no data, get data from their leader? In all
> three cases the replica with data is the leader.
>
> I also see two other collections where the replica's numDocs don't quite
> match
> In those two cases the leader has a few more docs than the other replica
>
> How to remedy this situation?
>
> This solrcloud is a target of CDCR replication, but I'm not sure why that
> would matter since I believe cdcr has the shard leaders communicate and the
> followers should just get their updates from their leader as they would
> from a normal update
>
> I'm just lucky that this is not a production solrcloud! Still need to know
> how to fix it.
>
> Thanks!
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Mime
View raw message