lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: Solr Admin Page Says Leader is Down, Replica is Up, Zookeeper Says Thar They are Both Active
Date Sun, 19 May 2013 11:01:20 GMT
This is from my error log:


org.apache.solr.common.SolrException: No registered leader was found,
collection:collection1 slice:shard1
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484)
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderUrl(ZkStateReader.java:458)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:843)
at org.apache.solr.cloud.ZkController.register(ZkController.java:776)
at org.apache.solr.cloud.ZkController.register(ZkController.java:727)
at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:900)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1195)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:696)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:166)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)



2013/5/19 Furkan KAMACI <furkankamaci@gmail.com>

> Hi Mark;
>
> I continue my tests and realized that issue. I have 1 leader and 1 replica
> at each shard. I have killed the leader:
>
> * Cloud graph says that *leader has gone ** *(which I expect). However *
> previous* *none leader still is not a leader ** *(which I didn't expect).
>
> * Zookeeper clusterstate.json says that the node that has gone is* still
> active and leader *(which I didn't expect)
>
> * At Cloud tree link; /collections/collection1/leaders* *shard1 says that
> *previous* *none leader is leader *(which I expect)*
> *
> *
> *
> Is there* *any* *contradiction* *here or do I miss anything?*
> *
>
> PS: I have reloaded core at replica and I got an error about no registered
> leader was found and error getting leader from zk for shard shard1. This
> maybe an issue about Zookeeper too?
>
>
> 2013/5/14 Mark Miller <markrmiller@gmail.com>
>
>> The actual state is a mix of the clusterstate.json and the ephemeral live
>> nodes - a node may be listed as active or whatever, and if it's live node
>> is not up, it doesn't matter - it's considered down.
>>
>> - Mark
>>
>> On May 14, 2013, at 8:08 AM, Furkan KAMACI <furkankamaci@gmail.com>
>> wrote:
>>
>> > Node is shown as down at admin page. It says there is one replica for
>> that
>> > shard but leader is dead (no new leader is selected!) however when I
>> check
>> > zookeeper information from /clusterstate.json at admin page I see that:
>> >
>> > "shard2":{
>> > "range":"b3330000-e665ffff",
>> > "state":"active",
>> > "replicas":{
>> > "10.***.**.*1:8983_solr_collection1":{
>> > "shard":"shard2",
>> > *"state":"active",*
>> > "core":"collection1",
>> > "collection":"collection1",
>> > "node_name":"10.***.**.*1:8983_solr",
>> >  "base_url":"http://10.***.**.*1:8983/solr",
>> > "leader":"true"},
>> > "10.***.**.**2:8983_solr_collection1":{
>> > "shard":"shard2",
>> > *"state":"active",*
>> > "core":"collection1",
>> > "collection":"collection1",
>> >  "node_name":"10.***.***.**2:8983_solr",
>> >  "base_url":"http://10.***.***.**2:8983/solr"}}},
>> >
>> > I mean dead node is still listed as active!
>> >
>> > I have exceptions and warning at my solr log:
>> >
>> > ...
>> > INFO: Updating cluster state from ZooKeeper...
>> > May 14, 2013 2:31:12 PM org.apache.solr.cloud.ZkController
>> > publishAndWaitForDownStates
>> > WARNING: Timed out waiting to see all nodes published as DOWN in our
>> cluster
>> > ...
>> > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController getLeader
>> > SEVERE: Error getting leader from zk
>> > org.apache.solr.common.SolrException: There is conflicting information
>> > about the leader of shard: shard2 our state
>> > says:http://10.***.***.*1:8983/solr/collection1/
>> > but zookeeper says:http://10.***.***.**2:8983/solr/collection1/
>> > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:849)
>> > at org.apache.solr.cloud.ZkController.register(ZkController.java:776)
>> > at org.apache.solr.cloud.ZkController.register(ZkController.java:727)
>> > at
>> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
>> > at
>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
>> > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
>> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
>> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >
>> > at java.lang.Thread.run(Thread.java:722)
>> >
>> > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish
>> > INFO: publishing core=collection1 state=down
>> > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish
>> > INFO: numShards not found on descriptor - reading it from system
>> property
>> > May 14, 2013 2:32:14 PM org.apache.solr.common.SolrException log
>> > SEVERE: :org.apache.solr.common.SolrException: Error getting leader
>> from zk
>> > for shard shard2
>> > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864)
>> > at org.apache.solr.cloud.ZkController.register(ZkController.java:776)
>> > at org.apache.solr.cloud.ZkController.register(ZkController.java:727)
>> > at
>> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
>> > at
>> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
>> > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
>> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
>> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >
>> > at java.lang.Thread.run(Thread.java:722)
>> >
>> > and after that it closes main searcher.
>> >
>> > How can I get rid of this error and why there is a mismatch between
>> admin
>> > page's graph and clusterstate?
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message