lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Kudinov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-3993) SolrCloud leader election on single node stucks the initialization
Date Thu, 25 Oct 2012 15:25:11 GMT
Alexey Kudinov created SOLR-3993:
------------------------------------

             Summary: SolrCloud leader election on single node stucks the initialization
                 Key: SOLR-3993
                 URL: https://issues.apache.org/jira/browse/SOLR-3993
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 4.0
         Environment: Windows 7, Tomcat 6
            Reporter: Alexey Kudinov


 setup:
1 node, 4 cores, 2 shards.
15 documents indexed.

problem:
init stage times out.

probable cause:
According to the init flow, cores are initialized one by one synchronously.
Actually, the main thread waits ShardLeaderElectionContext.waitForReplicasToComeUp until retry
threshold, while replica cores are not yet initialized, in other words there is no chance
other replicas go up in the meanwhile.
stack trace:
Thread [main] (Suspended)
        owns: HashMap<K,V>  (id=3876)
        owns: StandardContext  (id=3877)
        owns: HashMap<K,V>  (id=3878)
        owns: StandardHost  (id=3879)
        owns: StandardEngine  (id=3880)
        owns: Service[]  (id=3881)
        Thread.sleep(long) line: not available [native method]
        ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String) line: 298
        ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143
        LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152
        LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96
        LeaderElector.joinElection(ElectionContext) line: 262
        ZkController.joinElection(CoreDescriptor, boolean) line: 733
        ZkController.register(String, CoreDescriptor, boolean, boolean) line: 566
        ZkController.register(String, CoreDescriptor) line: 532
        CoreContainer.registerInZk(SolrCore) line: 709
        CoreContainer.register(String, SolrCore, boolean) line: 693
        CoreContainer.load(String, InputSource) line: 535
        CoreContainer.load(String, File) line: 356
        CoreContainer$Initializer.initialize() line: 308
        SolrDispatchFilter.init(FilterConfig) line: 107
        ApplicationFilterConfig.getFilter() line: 295
        ApplicationFilterConfig.setFilterDef(FilterDef) line: 422
        ApplicationFilterConfig.<init>(Context, FilterDef) line: 115
        StandardContext.filterStart() line: 4072
        StandardContext.start() line: 4726
        StandardHost(ContainerBase).addChildInternal(Container) line: 799
        StandardHost(ContainerBase).addChild(Container) line: 779
        StandardHost.addChild(Container) line: 601
        HostConfig.deployDescriptor(String, File, String) line: 675
        HostConfig.deployDescriptors(File, String[]) line: 601
        HostConfig.deployApps() line: 502
        HostConfig.start() line: 1317
        HostConfig.lifecycleEvent(LifecycleEvent) line: 324
        LifecycleSupport.fireLifecycleEvent(String, Object) line: 142
        StandardHost(ContainerBase).start() line: 1065
        StandardHost.start() line: 840
        StandardEngine(ContainerBase).start() line: 1057
        StandardEngine.start() line: 463
        StandardService.start() line: 525
        StandardServer.start() line: 754
        Catalina.start() line: 595
        NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native
method]
        NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available
        DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available
        Method.invoke(Object, Object...) line: not available
        Bootstrap.start() line: 289
        Bootstrap.main(String[]) line: 414

       
After a while, the session times out and following exception appears:
Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95
Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Was waiting for replicas to come up, but they are taking too long - assuming they won't
come back till later
Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log
SEVERE: Errir checking for the number of election participants:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/collection1/leader_elect/shard2/election
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
        at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227)
        at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
        at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224)
        at org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276)
        at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143)
        at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152)
        at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
        at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262)
        at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733)
        at org.apache.solr.cloud.ZkController.register(ZkController.java:566)
        at org.apache.solr.cloud.ZkController.register(ZkController.java:532)
        at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709)
        at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
        at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
        at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
        at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
        at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
        at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
        at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
        at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
        at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
        at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
        at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
        at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
        at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
        at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
        at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
        at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
        at org.apache.catalina.core.StandardService.start(StandardService.java:525)
        at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

Followed by:
Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=collection1
Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=collection1
Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=collection1:org.apache.solr.common.SolrException:
No registered leader was found, collection:collection1 slice:shard1
        at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
        at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
        at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
        at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message