lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kuiper <matt.kui...@issinc.com>
Subject RE: Clusterstate - state active
Date Thu, 09 Apr 2015 14:36:57 GMT
Erick,

I do not give it an explicit name.  I use call like:

 curl 172.29.24.47:8983/solr/admin/collections?action=ADDREPLICA&collection=kla_collection&shard=shard25&node=172.29.24.75:8983_solr

It does not appear to be reusing the name, if by name you mean core_node*, or core.  Both
are different below for replicas marked as leader true.  Note the second section shows recovery
failed for leader....

            "shard25":{
              "range":"fae10000-ffffffff",
              "state":"active",
              "replicas":{
                "core_node48":{
                  "state":"active",
                  "core":"kla_collection_shard25_replica1",
                  "node_name":"172.29.24.48:8983_solr",
                  "base_url":"http://172.29.24.48:8983/solr",
                  "leader":"true"},
                "core_node59":{
                  "state":"active",
                  "core":"kla_collection_shard25_replica2",
                  "node_name":"172.29.24.47:8983_solr",
                  "base_url":"http://172.29.24.47:8983/solr"}}},


            "shard25":{
              "range":"fae10000-ffffffff",
              "state":"active",
              "replicas":{
                "core_node149":{
                  "state":"recovery_failed",
                  "core":"kla_collection_shard25_replica3",
                  "node_name":"172.29.24.75:8983_solr",
                  "base_url":"http://172.29.24.75:8983/solr",
                  "leader":"true"},
                "core_node150":{
                  "state":"recovering",
                  "core":"kla_collection_shard25_replica1",
                  "node_name":"172.29.24.76:8983_solr",
                  "base_url":"http://172.29.24.76:8983/solr"}}},

Thanks,
Matt

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, April 08, 2015 6:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Clusterstate - state active

Matt:

How are you creating the new replica? Are you giving it an explicit name? And especially is
it the same name as one you've already deleted?

'cause I can't really imagine why you'd be getting a ZK exception saying the node already
exists.

Shot in the dark here......

On Wed, Apr 8, 2015 at 4:11 PM, Matt Kuiper <matt.kuiper@issinc.com> wrote:
> Found this error which likely explains my issue with new replicas not coming up, not
sure next step.  Almost looks like Zookeeper's record of a Shard's leader is not being updated?
>
> 4/8/2015, 4:56:03 PM
> ERROR
> ShardLeaderElectionContext
> There was a problem trying to register as the 
> leader:org.apache.solr.common.SolrException: Could not register as the leader because
creating the ephemeral registration node in ZooKeeper failed There was a problem trying to
register as the leader:org.apache.solr.common.SolrException: Could not register as the leader
because creating the ephemeral registration node in ZooKeeper failed
>         at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150)
>         at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306)
>         at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
>         at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
>         at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
>         at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
>         at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4
>         at org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
>         at org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137)
>         ... 11 more
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode
= NodeExists for /collections/kla_collection/leaders/shard4
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403)
>         at org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142)
>         at 
> org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:
> 34)
>
> Matt
>
>
> -----Original Message-----
> From: Matt Kuiper [mailto:matt.kuiper@issinc.com]
> Sent: Wednesday, April 08, 2015 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Clusterstate - state active
>
> Erick, Anshum,
>
> Thanks for your replies!  Yes, it is replica state that I am looking at, and this the
answer I was hoping for.
>
> I am working on a solution that involves moving some replicas to new Solr nodes as they
are made available.  Before deleting the original replicas backing the shard, I check the
replica state to make sure is active for the new replicas.
>
> Initially it was working pretty well, but with more recent testing I regularly see the
shard go down.  The two new replicas go into failed recovery state after the original replicas
are deleted, the logs report that a registered leader was not found for the shard.  Initially
I was concerned that maybe the new shards were not fully synced with the leader, even though
I checked for active state.
>
> Now I am wondering if the new shards are somehow competing (or somehow reluctant )  to
become leader, and thus neither become leader.  I plan to test just creating one new replica
on a new solr node, checking for state is active, then deleting original replicas, and then
creating second new replica.
>
> Any thoughts?
>
> Matt
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, April 08, 2015 4:13 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Clusterstate - state active
>
> Matt:
>
> In a word, "yes". Depending on the size of the index for that shard, the transition from
Down->Recovering->Active may be too fast to catch.
> If replicating the index takes a while, though, you should at least see the "Recovering"
state, during which time there won't be any searches forwarded to that node.
>
> Best,
> Erick
>
> On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper <matt.kuiper@issinc.com> wrote:
>> Hello,
>>
>> When creating a new replica, and the state is recorded as active with in ZK clusterstate,
does that mean that new replica has synched with the leader replica for the particular shard?
>>
>> Thanks,
>> Matt
>>
Mime
View raw message