lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
Date Mon, 23 Dec 2013 23:18:51 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856013#comment-13856013
] 

Mark Miller commented on SOLR-4260:
-----------------------------------

bq. so we kind of punted

The other thing to note is that if you restart the shard or that node or the cluster, you
should be able to do it without losing any data. It will recover from the leader when everything
else is working correctly.

> Inconsistent numDocs between leader and replica
> -----------------------------------------------
>
>                 Key: SOLR-4260
>                 URL: https://issues.apache.org/jira/browse/SOLR-4260
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>         Environment: 5.0.0.2013.01.04.15.31.51
>            Reporter: Markus Jelsma
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, 4.7
>
>         Attachments: 192.168.20.102-replica1.png, 192.168.20.104-replica2.png, clusterstate.png
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer
we see inconsistencies between the leader and replica for some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small
deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents,
not more.
> Results hopping ranks in the result set for identical queries got my attention, there
were small IDF differences for exactly the same record causing a record to shift positions
in the result set. During those tests no records were indexed. Consecutive catch all queries
also return different number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor of two and
frequently reindex using a fresh build from trunk. I've not seen this issue for quite some
time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message