lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
Date Fri, 06 Dec 2013 23:37:37 GMT


Mark Miller commented on SOLR-4260:

I've fixed some things since 4.6 - I only had time to focus on the leader not going down case
for 4.6, I spent a bunch more time on this case after 4.6 was released. Unfortunately, I think
there are a couple of issues at play here - some of the new changes makes existing holes easier
to spot and the chaos monkey tests where accidentally disabled for some time, so small issues
may have crept in.

I *think* the remaining issue is mostly around SOLR-5516. Need to come up with a better idea
than a really long wait though - but if someone wants to help test, putting in a long wait
and stressing this would be useful to see if it is indeed the main remaining issue.

I recently put in a lot of time improving the situation and I need to focus on other things
for a bit, but that I'll keep coming back to this as I can.

> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>                 Key: SOLR-4260
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>         Environment:
>            Reporter: Markus Jelsma
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.7
>         Attachments:,, clusterstate.png
> After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer
we see inconsistencies between the leader and replica for some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small
deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents,
not more.
> Results hopping ranks in the result set for identical queries got my attention, there
were small IDF differences for exactly the same record causing a record to shift positions
in the result set. During those tests no records were indexed. Consecutive catch all queries
also return different number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor of two and
frequently reindex using a fresh build from trunk. I've not seen this issue for quite some
time until a few days ago.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message