lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jessica Mallet <mewmewb...@gmail.com>
Subject Re: waitForLeaderToSeeDownState when leader is down
Date Fri, 18 Apr 2014 00:16:48 GMT
Hi Mark,

I'm using "roughly" the 4x branch up to this commit:
https://github.com/apache/lucene-solr/tree/25aaf44221e3a3be8fab1ba22f16b13f5df6c64c.
Will you please point me to the jira that addressed this? I couldn't find
it with "waitForLeaderToSeeDownState".

Thanks,
Jessica


On Wed, Apr 16, 2014 at 4:35 PM, Mark Miller <markrmiller@gmail.com> wrote:

> What version are you testing? Thought we had addressed this.
> --
> Mark Miller
> about.me/markrmiller
>
> On April 16, 2014 at 6:02:09 PM, Jessica Mallet (mewmewball@gmail.com)
> wrote:
>
> Hi Furkan,
>
> Thanks for the reply. I understand the intent. However, in the case I
> described, the follower is blocked on looking for a leader (throws the
> pasted exception because it can't find the leader) before it participates
> in election; therefore, it will never come up while the leader waits for it
> to come up (they're deadlocked waiting for each other). What I'm suggesting
> is that maybe the follower should just just skip
> waitForLeaderToSeeDownState
> when there's no leader (instead of failing with the pasted stacktrace) and
> go ahead and start participating in election. That way the leader will see
> more replicas come up, and they can sync with each other and move on.
>
> Thanks,
> Jessica
>
>
> On Sat, Apr 12, 2014 at 4:14 PM, Furkan KAMACI <furkankamaci@gmail.com
> >wrote:
>
> > Hi;
> >
> > There is an explanation as follows: "This is meant to protect the case
> > where you stop a shard or it fails and then the first node to get started
> > back up has stale data - you don't want it to just become the leader. So
> we
> > wait to see everyone we know about in the shard up to 3 or 5 min by
> > default. Then we know all the shards participate in the leader election
> and
> > the leader will end up with all updates it should have." You can check it
> > from here:
> >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wng_YykcXgGeNTGCXGUHhCJhiDear-JygpgRnKAEdRZpFA@mail.gmail.com%3E
> >
> > Thanks;
> > Furkan KAMACI
> >
> >
> > 2014-04-08 23:51 GMT+03:00 Jessica Mallet <mewmewball@gmail.com>:
> >
> > > To clarify, when I said "leader" and "follower" I meant the old leader
> > and
> > > follower before the zookeeper session expiration. When they're
> recovering
> > > there's no leader.
> > >
> > >
> > > On Tue, Apr 8, 2014 at 1:49 PM, Jessica Mallet <mewmewball@gmail.com>
> > > wrote:
> > >
> > > > I'm playing with dropping the cluster's connections to zookeeper and
> > then
> > > > reconnecting them, and during recovery, I always see this on the
> > leader's
> > > > logs:
> > > >
> > > > ElectionContext.java (line 361) Waiting until we see more replicas up
> > for
> > > > shard shard1: total=2 found=1 timeoutin=139902
> > > >
> > > > and then on the follower, I see:
> > > > SolrException.java (line 121) There was a problem finding the leader
> in
> > > > zk:org.apache.solr.common.SolrException: Could not get leader props
> > > > at
> > > >
> > org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958)
> > > > at
> > > >
> > org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922)
> > > > at
> > > >
> > >
> >
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463)
> > > > at
> > > >
> > >
> >
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380)
> > > > at
> > > > org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
> > > > at
> > > > org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)
> > > > at
> > > >
> > >
> >
> org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179)
> > > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> > > > KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1
> > > > at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> > > > at
> > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> > > > at
> > > >
> > >
> >
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273)
> > > > at
> > > >
> > >
> >
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270)
> > > > at
> > > >
> > >
> >
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
> > > > at
> > > >
> > org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270)
> > > > at
> > > >
> > org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936)
> > > > ... 6 more
> > > >
> > > > They block each other's progress until leader decides to give up and
> > not
> > > > wait for more replicas to come up:
> > > >
> > > > ElectionContext.java (line 368) Was waiting for replicas to come up,
> > but
> > > > they are taking too long - assuming they won't come back till later
> > > >
> > > > and then recovery moves forward again.
> > > >
> > > > Should waitForLeaderToSeeDownState move on if there's no leader at
> the
> > > > moment?
> > > > Thanks,
> > > > Jessica
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message