lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Bois-Crettez <andre.b...@kelkoo.com>
Subject Re: FW: Replication error and Shard Inconsistencies..
Date Wed, 05 Dec 2012 17:57:20 GMT
Not sure but, maybe you are running out of file descriptors ?
On each solr instance, look at the "dashboard" admin page, there is a
bar with "File Descriptor Count".

However if this was the case, I would expect to see lots of errors in
the solr logs...

André


On 12/05/2012 06:41 PM, Annette Newton wrote:
> Sorry to bombard you - final update of the day...
>
> One thing that I have noticed is that we have a lot of connections between
> the solr boxes with the connection set to CLOSE_WAIT and they hang around
> for ages.
>
> -----Original Message-----
> From: Annette Newton [mailto:annette.newton@servicetick.com]
> Sent: 05 December 2012 13:55
> To: solr-user@lucene.apache.org
> Subject: FW: Replication error and Shard Inconsistencies..
>
> Update:
>
> I did a full restart of the solr cloud setup, stopped all the instances,
> cleared down zookeeper and started them up individually.  I then removed the
> index from one of the replicas, restarted solr and it replicated ok.  So I'm
> wondering whether this is something that happens over a period of time.
>
> Also just to let you know I changed the schema a couple of times and
> reloaded the cores on all instances previous to the problem.  Don't know if
> this could have contributed to the problem.
>
> Thanks.
>
> -----Original Message-----
> From: Annette Newton [mailto:annette.newton@servicetick.com]
> Sent: 05 December 2012 09:04
> To: solr-user@lucene.apache.org
> Subject: RE: Replication error and Shard Inconsistencies..
>
> Hi Mark,
>
> Thanks so much for the reply.
>
> We are using the release version of 4.0..
>
> It's very strange replication appears to be underway, but no files are being
> copied across.  I have attached both the log from the new node that I tried
> to bring up and the Schema and config we are using.
>
> I think it's probably something weird with our config, so I'm going to play
> around with it today.  If I make any progress I'll send an update.
>
> Thanks again.
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: 05 December 2012 00:04
> To: solr-user@lucene.apache.org
> Subject: Re: Replication error and Shard Inconsistencies..
>
> Hey Annette,
>
> Are you using Solr 4.0 final? A version of 4x or 5x?
>
> Do you have the logs for when the replica tried to catch up to the leader?
>
> Stopping and starting the node is actually a fine thing to do. Perhaps you
> can try it again and capture the logs.
>
> If a node is not listed as live but is in the clusterstate, that is fine. It
> shouldn't be consulted. To remove it, you either have to unload it with the
> core admin api or you could manually delete it's registered state under the
> node states node that the Overseer looks at.
>
> Also, it would be useful to see the logs of the new node coming up.there
> should be info about what happens when it tries to replicate.
>
> It almost sounds like replication is just not working for your setup at all
> and that you have to tweak some configuration. You shouldn't see these nodes
> as active then though - so we should get to the bottom of this.
>
> - Mark
>
> On Dec 4, 2012, at 4:37 AM, Annette Newton<annette.newton@servicetick.com>
> wrote:
>
>> Hi all,
>>
>> I have a quite weird issue with Solr cloud.  I have a 4 shard, 2
>> replica
> setup, yesterday one of the nodes lost communication with the cloud setup,
> which resulted in it trying to run replication, this failed, which has left
> me with a Shard (Shard 4) that has one node with 2,833,940 documents on the
> leader and 409,837 on the follower - obviously a big discrepancy and this
> leads to queries returning differing results depending on which of these
> nodes it gets the data from.  There is no indication of a problem on the
> admin site other than the big discrepancy in the number of documents.  They
> are all marked as active etc.
>>
>> So I thought that I would force replication to happen again, by
>> stopping
> and starting solr (probably the wrong thing to do) but this resulted in no
> change.  So I turned off that node and replaced it with a new one.  In
> zookeeper live nodes doesn't list that machine but it is still being shown
> as active on in the ClusterState.json, I have attached images showing this.
> This means the new node hasn't replaced the old node but is now a replica on
> Shard 1!  Also that node doesn't appear to have replicated Shard 1's data
> anyway, it didn't get marked with replicating or anything.
>>
>> How do I clear the zookeeper state without taking down the entire solr
> cloud setup?  How do I force a node to replicate from the others in the
> shard?
>>
>> Thanks in advance.
>>
>> Annette Newton
>>
>>
>> <LiveNodes.zip>
>
>
>
>
>
>
> --
> André Bois-Crettez
>
> Search technology, Kelkoo
> http://www.kelkoo.com/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive
de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire
et d'en avertir l'expéditeur.

Mime
View raw message