Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 855AAD779 for ; Mon, 12 Nov 2012 00:43:14 +0000 (UTC) Received: (qmail 74653 invoked by uid 500); 12 Nov 2012 00:43:13 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 74582 invoked by uid 500); 12 Nov 2012 00:43:13 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 74573 invoked by uid 99); 12 Nov 2012 00:43:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Nov 2012 00:43:13 +0000 Date: Mon, 12 Nov 2012 00:43:13 +0000 (UTC) From: "Mark Miller (JIRA)" To: dev@lucene.apache.org Message-ID: <729535308.98882.1352680993242.JavaMail.jiratomcat@arcas> In-Reply-To: <2039244269.27759.1351178711941.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (SOLR-3993) SolrCloud leader election on single node stucks the initialization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495057#comment-13495057 ] Mark Miller commented on SOLR-3993: ----------------------------------- I've written a test for that issue that fails before SOLR-4063 and passes after it. I'll commit it shortly. > SolrCloud leader election on single node stucks the initialization > ------------------------------------------------------------------ > > Key: SOLR-3993 > URL: https://issues.apache.org/jira/browse/SOLR-3993 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.0 > Environment: Windows 7, Tomcat 6 > Reporter: Alexey Kudinov > Assignee: Mark Miller > Fix For: 4.1, 5.0 > > > setup: > 1 node, 4 cores, 2 shards. > 15 documents indexed. > problem: > init stage times out. > probable cause: > According to the init flow, cores are initialized one by one synchronously. > Actually, the main thread waits ShardLeaderElectionContext.waitForReplicasToComeUp until retry threshold, while replica cores are not yet initialized, in other words there is no chance other replicas go up in the meanwhile. > stack trace: > Thread [main] (Suspended) > owns: HashMap (id=3876) > owns: StandardContext (id=3877) > owns: HashMap (id=3878) > owns: StandardHost (id=3879) > owns: StandardEngine (id=3880) > owns: Service[] (id=3881) > Thread.sleep(long) line: not available [native method] > ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String) line: 298 > ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143 > LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152 > LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96 > LeaderElector.joinElection(ElectionContext) line: 262 > ZkController.joinElection(CoreDescriptor, boolean) line: 733 > ZkController.register(String, CoreDescriptor, boolean, boolean) line: 566 > ZkController.register(String, CoreDescriptor) line: 532 > CoreContainer.registerInZk(SolrCore) line: 709 > CoreContainer.register(String, SolrCore, boolean) line: 693 > CoreContainer.load(String, InputSource) line: 535 > CoreContainer.load(String, File) line: 356 > CoreContainer$Initializer.initialize() line: 308 > SolrDispatchFilter.init(FilterConfig) line: 107 > ApplicationFilterConfig.getFilter() line: 295 > ApplicationFilterConfig.setFilterDef(FilterDef) line: 422 > ApplicationFilterConfig.(Context, FilterDef) line: 115 > StandardContext.filterStart() line: 4072 > StandardContext.start() line: 4726 > StandardHost(ContainerBase).addChildInternal(Container) line: 799 > StandardHost(ContainerBase).addChild(Container) line: 779 > StandardHost.addChild(Container) line: 601 > HostConfig.deployDescriptor(String, File, String) line: 675 > HostConfig.deployDescriptors(File, String[]) line: 601 > HostConfig.deployApps() line: 502 > HostConfig.start() line: 1317 > HostConfig.lifecycleEvent(LifecycleEvent) line: 324 > LifecycleSupport.fireLifecycleEvent(String, Object) line: 142 > StandardHost(ContainerBase).start() line: 1065 > StandardHost.start() line: 840 > StandardEngine(ContainerBase).start() line: 1057 > StandardEngine.start() line: 463 > StandardService.start() line: 525 > StandardServer.start() line: 754 > Catalina.start() line: 595 > NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] > NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available > DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available > Method.invoke(Object, Object...) line: not available > Bootstrap.start() line: 289 > Bootstrap.main(String[]) line: 414 > > After a while, the session times out and following exception appears: > Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp > INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95 > Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp > INFO: Was waiting for replicas to come up, but they are taking too long - assuming they won't come back till later > Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log > SEVERE: Errir checking for the number of election participants:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/collection1/leader_elect/shard2/election > at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) > at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227) > at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224) > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63) > at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224) > at org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276) > at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143) > at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152) > at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96) > at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262) > at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733) > at org.apache.solr.cloud.ZkController.register(ZkController.java:566) > at org.apache.solr.cloud.ZkController.register(ZkController.java:532) > at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709) > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) > at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) > at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) > at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) > at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) > at org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115) > at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) > at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) > at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) > at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) > at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) > at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) > at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) > at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) > at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) > at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) > at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) > at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) > at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) > at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) > at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) > at org.apache.catalina.core.StandardService.start(StandardService.java:525) > at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) > at org.apache.catalina.startup.Catalina.start(Catalina.java:595) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) > at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) > Followed by: > Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery > SEVERE: Recovery failed - trying again... core=collection1 > Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log > SEVERE: Error while trying to recover. core=collection1 > Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log > SEVERE: Error while trying to recover. core=collection1:org.apache.solr.common.SolrException: No registered leader was found, collection:collection1 slice:shard1 > at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413) > at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399) > at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org