lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Mikuckis <lukasmikuc...@gmail.com>
Subject Re: SolrCloud from "Stopping recovery for" warnings to crash
Date Mon, 24 Mar 2014 14:21:11 GMT
Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr
started crashing).
When we were upgrading, we just upgraded solr and changed versions in
collections configs.

When solr crashes we get OOM but only 2h after first Stopping recovery
warnings.

Maybe you have any ideas when Stopping recovery warnings are thrown?
Because now we have no idea what could cause this issue.

Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar <shalinmangar@gmail.com
>:
>
> Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can
> cause out of memory issues. Can you check your logs for out of memory
> errors?
>
> On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis <lukasmikuckis@gmail.com>
wrote:
> > Solr version: 4.7
> >
> > Architecture:
> > 2 solrs (1 shard, leader + replica)
> > 3 zookeepers
> >
> > Servers:
> > * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores
> > * zookeeper + solr  (heap 4gb) - RAM 8gb, 2 cpu cores
> > * zookeeper
> >
> > Solr data:
> > * 21 collections
> > * Many fields, small docs, docs count per collection from 1k to 500k
> >
> > About a week ago solr started crashing. It crashes every day, 3-4 times
a
> > day. Usually at nigh. I can't tell anything what could it be related to
> > because at that time we haven't done any configuration changes. Load
> > haven't changed too.
> >
> >
> > Everything starts with Stopping recovery for .. warnings (every
warnings is
> > repeated several times):
> >
> > WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
> > zkNodeName=core_node1core=******************
> >
> > WARN  org.apache.solr.cloud.ElectionContext; cancelElection did not find
> > election node to remove
> >
> > WARN  org.apache.solr.update.PeerSync; no frame of reference to tell if
> > we've missed updates
> >
> > WARN  - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no
frame
> > of reference to tell if we've missed updates
> >
> > WARN  - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller;
File
> > _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879
> >
> > WARN  - 2014-03-23 04:00:54.126;
> > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay
> >
tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0000000000000003272
> > refcount=2} active=true starting pos=356216606
> >
> > Then again Stopping recovery for .. warnings:
> >
> > WARN  org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for
> > zkNodeName=core_node1core=******************
> >
> > ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException;
> > org.apache.solr.common.SolrException: No registered leader was found
after
> > waiting for 4000ms , collection: collection1 slice: shard1
> >
> > ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException;
> > org.apache.solr.common.SolrException: I was asked to wait on state down
for
> > IP:PORT_solr but I still do not see the requested state. I see state:
> > active live:false
> >
> >
> > After this serves mostly didn't recover.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message