lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vamsee Yarlagadda (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-7198) Deleting a collection during leader election results in left over znodes in ZK
Date Fri, 06 Mar 2015 19:05:40 GMT

     [ https://issues.apache.org/jira/browse/SOLR-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vamsee Yarlagadda updated SOLR-7198:
------------------------------------
    Description: 
I am seeing this issue while trying to run a collection delete operation in the middle of
leader election process.

Contents of ZK (after issuing collection delete and waiting for some time)
{code}
  /
    /aliases.json
    /clusterstate.json
    /collections
      SolrUpgrade_collection
        leaders
           shard2_1
    /configs
    /live_nodes
    /overseer
    /overseer_elect
    /solr
    /solr.xml

Contents of znode shard2_1:

version0
aversion0
children_count0
ctimeThu Mar 05 22:05:28 PST 2015 (1425621928169)
cversion0
czxid22815
ephemeralOwner93427899815755800
mtimeThu Mar 05 22:05:28 PST 2015 (1425621928169)
mzxid22815
pzxid22815
dataLength194
{
  "core":"SolrUpgrade_collection_shard2_1_replica2",
  "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
  "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"
}
{code}

Clusterstate.json before running a delete collection
{code}
{
  "shards":{
    "shard1":{
      "range":"80000000-ffffffff",
      "state":"active",
      "replicas":{
        "core_node1":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard1_replica2",
          "node_name":"search-testing-c5-ha-4.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-4.vpc.cloudera.com:8983/solr",
          "leader":"true"},
        "core_node2":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard1_replica1",
          "node_name":"search-testing-c5-ha-2.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-2.vpc.cloudera.com:8983/solr"}}},
    "shard2_0":{
      "range":"0-3fffffff",
      "state":"active",
      "replicas":{
        "core_node5":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_0_replica1",
          "node_name":"search-testing-c5-ha-3.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-3.vpc.cloudera.com:8983/solr",
          "leader":"true"},
        "core_node7":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_0_replica2",
          "node_name":"search-testing-c5-ha-4.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-4.vpc.cloudera.com:8983/solr"}}},
    "shard2_1":{
      "range":"40000000-7fffffff",
      "state":"active",
      "replicas":{
        "core_node8":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_1_replica2",
          "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"},
        "core_node9":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_1_replica3",
          "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"}}}},
  "maxShardsPerNode":"10",
  "router":"compositeId",
  "replicationFactor":"2",
  "autoAddReplicas":"false",
  "routerSpec":{"name":"compositeId"}}}
{code}

As we can notice, shard (*shard2_1*) doesn't have any leader and i can see from the logs that
the replicas of the shard just started the leader election process. And here are the Solr
logs from one of the above replicas which eventually becomes the leader and registers in ZK
even though the collection was deleted.
{code}
2015-03-05 22:05:25,383 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Running the
leader process for shard shard2_1
2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Checking if
I (core=SolrUpgrade_collection_shard2_1_replica2,coreNodeName=core_node8) should try and be
the leader.
2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: My last published
State was Active, it's okay to be the leader.
2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: I may be the
new leader - try and sync
2015-03-05 22:05:25,506 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
2015-03-05 22:05:25,620 INFO org.apache.solr.core.SolrCore.Request: [SolrUpgrade_collection_shard2_1_replica2]
webapp=/solr path=/select params={q=*:*&distrib=false&wt=javabin&version=2} hits=118
status=0 QTime=1 
2015-03-05 22:05:25,623 INFO org.apache.solr.core.SolrCore.Request: [SolrUpgrade_collection_shard2_1_replica3]
webapp=/solr path=/select params={q=*:*&distrib=false&wt=javabin&version=2} hits=118
status=0 QTime=0 
2015-03-05 22:05:27,392 INFO org.apache.solr.cloud.ElectionContext: canceling election /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node8-n_0000000001
2015-03-05 22:05:27,393 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica3]
 CLOSING SolrCore org.apache.solr.core.SolrCore@559b52d3
2015-03-05 22:05:27,393 INFO org.apache.solr.update.UpdateHandler: closing DirectUpdateHandler2{commits=0,autocommit
maxTime=60000ms,autocommits=0,soft autocommit maxTime=1000ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
2015-03-05 22:05:27,394 INFO org.apache.solr.update.SolrCoreState: Closing SolrCoreState
2015-03-05 22:05:27,394 INFO org.apache.solr.update.DefaultSolrCoreState: SolrCoreState ref
count has reached 0 - closing IndexWriter
2015-03-05 22:05:27,394 INFO org.apache.solr.update.DefaultSolrCoreState: closing IndexWriter
with IndexWriterCloser
2015-03-05 22:05:27,397 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null
path=/admin/cores params={deleteInstanceDir=true&action=UNLOAD&core=SolrUpgrade_collection_shard2_1_replica2&wt=javabin&qt=/admin/cores&deleteDataDir=true&version=2}
status=0 QTime=5 
2015-03-05 22:05:27,399 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Running the
leader process for shard shard2_1
2015-03-05 22:05:27,399 INFO org.apache.solr.cloud.ElectionContext: canceling election /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
2015-03-05 22:05:27,400 WARN org.apache.solr.cloud.LeaderElector: 
org.apache.solr.common.SolrException: SolrCore not found:SolrUpgrade_collection_shard2_1_replica3
in []
	at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:211)
	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
	at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:273)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2015-03-05 22:05:27,415 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica3]
Closing main searcher on request.
2015-03-05 22:05:27,438 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
2015-03-05 22:05:27,456 INFO org.apache.solr.core.CachingDirectoryFactory: Closing HdfsDirectoryFactory
- 2 directories currently being tracked
2015-03-05 22:05:27,457 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index;done=false>>]
2015-03-05 22:05:27,457 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index
2015-03-05 22:05:27,470 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index
2015-03-05 22:05:27,470 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data;done=false>>]
2015-03-05 22:05:27,470 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
2015-03-05 22:05:27,470 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
2015-03-05 22:05:27,471 INFO org.apache.solr.core.CachingDirectoryFactory: Removing directory
after core close: hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
2015-03-05 22:05:27,490 INFO org.apache.solr.cloud.ElectionContext: canceling election /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
2015-03-05 22:05:27,491 WARN org.apache.solr.cloud.ElectionContext: cancelElection did not
find election node to remove /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
2015-03-05 22:05:27,493 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null
path=/admin/cores params={deleteInstanceDir=true&action=UNLOAD&core=SolrUpgrade_collection_shard2_1_replica3&wt=javabin&qt=/admin/cores&deleteDataDir=true&version=2}
status=0 QTime=100 
2015-03-05 22:05:27,741 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.SyncStrategy: Sync replicas to http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr/SolrUpgrade_collection_shard2_1_replica2/
2015-03-05 22:05:27,888 ERROR org.apache.solr.cloud.SyncStrategy: Sync Failed:org.apache.solr.common.cloud.ZooKeeperException:
Could not find collection in zk: SolrUpgrade_collection []
	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:600)
	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:588)
	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:583)
	at org.apache.solr.cloud.SyncStrategy.syncWithReplicas(SyncStrategy.java:152)
	at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:122)
	at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:101)
	at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:267)
	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
	at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:273)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.SyncStrategy: Leader's attempt to sync
with shard failed, moving to the next candidate
2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.ShardLeaderElectionContext: We failed sync,
but we have no versions - we can't sync in that case - we were active before, so become leader
anyway
2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.ShardLeaderElectionContext: I am the new
leader: http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr/SolrUpgrade_collection_shard2_1_replica2/
shard2_1
2015-03-05 22:05:27,888 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica2]
 CLOSING SolrCore org.apache.solr.core.SolrCore@595167
2015-03-05 22:05:27,889 INFO org.apache.solr.update.UpdateHandler: closing DirectUpdateHandler2{commits=1,autocommit
maxTime=60000ms,autocommits=0,soft autocommit maxTime=1000ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
2015-03-05 22:05:27,890 INFO org.apache.solr.update.SolrCoreState: Closing SolrCoreState
2015-03-05 22:05:27,890 INFO org.apache.solr.update.DefaultSolrCoreState: SolrCoreState ref
count has reached 0 - closing IndexWriter
2015-03-05 22:05:27,890 INFO org.apache.solr.update.DefaultSolrCoreState: closing IndexWriter
with IndexWriterCloser
2015-03-05 22:05:27,905 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica2]
Closing main searcher on request.
2015-03-05 22:05:27,911 INFO org.apache.solr.core.CachingDirectoryFactory: Closing HdfsDirectoryFactory
- 2 directories currently being tracked
2015-03-05 22:05:27,912 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index;done=false>>]
2015-03-05 22:05:27,912 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index
2015-03-05 22:05:27,924 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index
2015-03-05 22:05:27,924 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data;done=false>>]
2015-03-05 22:05:27,924 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
2015-03-05 22:05:27,925 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
2015-03-05 22:05:28,026 INFO org.apache.solr.core.CachingDirectoryFactory: Removing directory
after core close: hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
2015-03-05 22:05:28,155 INFO org.apache.solr.common.cloud.SolrZkClient: makePath: /collections/SolrUpgrade_collection/leaders/shard2_1
2015-03-05 22:05:28,287 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
{code}


This might be because of a missing check at line
 https://github.com/apache/lucene-solr/blob/372e8448021d00d3466b45da8a6e57736354eee8/solr/core/src/java/org/apache/solr/cloud/ElectionContext.java#L138
As the makePath creates all the missing directories recursively.

May be we can check whether the collection znode exists before we create the leader znode?
Thoughts? 



  was:
I am seeing this issue while trying to run a collection delete operation in the middle of
leader election process.

Contents of ZK (after issuing collection delete and waiting for some time)
{code}
  /
    /aliases.json
    /clusterstate.json
    /collections
      SolrUpgrade_collection
        leaders
           shard2_1
    /configs
    /live_nodes
    /overseer
    /overseer_elect
    /solr
    /solr.xml

Contents of znode shard2_1:

version0
aversion0
children_count0
ctimeThu Mar 05 22:05:28 PST 2015 (1425621928169)
cversion0
czxid22815
ephemeralOwner93427899815755800
mtimeThu Mar 05 22:05:28 PST 2015 (1425621928169)
mzxid22815
pzxid22815
dataLength194
{
  "core":"SolrUpgrade_collection_shard2_1_replica2",
  "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
  "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"
}
{code}

Clusterstate.json before running a delete collection
{code}
{
  "shards":{
    "shard1":{
      "range":"80000000-ffffffff",
      "state":"active",
      "replicas":{
        "core_node1":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard1_replica2",
          "node_name":"search-testing-c5-ha-4.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-4.vpc.cloudera.com:8983/solr",
          "leader":"true"},
        "core_node2":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard1_replica1",
          "node_name":"search-testing-c5-ha-2.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-2.vpc.cloudera.com:8983/solr"}}},
    "shard2_0":{
      "range":"0-3fffffff",
      "state":"active",
      "replicas":{
        "core_node5":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_0_replica1",
          "node_name":"search-testing-c5-ha-3.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-3.vpc.cloudera.com:8983/solr",
          "leader":"true"},
        "core_node7":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_0_replica2",
          "node_name":"search-testing-c5-ha-4.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-4.vpc.cloudera.com:8983/solr"}}},
    "shard2_1":{
      "range":"40000000-7fffffff",
      "state":"active",
      "replicas":{
        "core_node8":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_1_replica2",
          "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"},
        "core_node9":{
          "state":"active",
          "core":"SolrUpgrade_collection_shard2_1_replica3",
          "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
          "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"}}}},
  "maxShardsPerNode":"10",
  "router":"compositeId",
  "replicationFactor":"2",
  "autoAddReplicas":"false",
  "routerSpec":{"name":"compositeId"}}}
{code}

As we can notice, shard (*shard2_1*) doesn't have any leader and i can see from the logs that
the replicas of the shard just started the leader election process. And here are the Solr
logs from one of the above replicas which eventually becomes the leader and registers in ZK
even though the collection was deleted.
{code}
2015-03-05 22:05:25,383 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Running the
leader process for shard shard2_1
2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Checking if
I (core=SolrUpgrade_collection_shard2_1_replica2,coreNodeName=core_node8) should try and be
the leader.
2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: My last published
State was Active, it's okay to be the leader.
2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: I may be the
new leader - try and sync
2015-03-05 22:05:25,506 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
2015-03-05 22:05:25,620 INFO org.apache.solr.core.SolrCore.Request: [SolrUpgrade_collection_shard2_1_replica2]
webapp=/solr path=/select params={q=*:*&distrib=false&wt=javabin&version=2} hits=118
status=0 QTime=1 
2015-03-05 22:05:25,623 INFO org.apache.solr.core.SolrCore.Request: [SolrUpgrade_collection_shard2_1_replica3]
webapp=/solr path=/select params={q=*:*&distrib=false&wt=javabin&version=2} hits=118
status=0 QTime=0 
2015-03-05 22:05:27,392 INFO org.apache.solr.cloud.ElectionContext: canceling election /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node8-n_0000000001
2015-03-05 22:05:27,393 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica3]
 CLOSING SolrCore org.apache.solr.core.SolrCore@559b52d3
2015-03-05 22:05:27,393 INFO org.apache.solr.update.UpdateHandler: closing DirectUpdateHandler2{commits=0,autocommit
maxTime=60000ms,autocommits=0,soft autocommit maxTime=1000ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
2015-03-05 22:05:27,394 INFO org.apache.solr.update.SolrCoreState: Closing SolrCoreState
2015-03-05 22:05:27,394 INFO org.apache.solr.update.DefaultSolrCoreState: SolrCoreState ref
count has reached 0 - closing IndexWriter
2015-03-05 22:05:27,394 INFO org.apache.solr.update.DefaultSolrCoreState: closing IndexWriter
with IndexWriterCloser
2015-03-05 22:05:27,397 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null
path=/admin/cores params={deleteInstanceDir=true&action=UNLOAD&core=SolrUpgrade_collection_shard2_1_replica2&wt=javabin&qt=/admin/cores&deleteDataDir=true&version=2}
status=0 QTime=5 
2015-03-05 22:05:27,399 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Running the
leader process for shard shard2_1
2015-03-05 22:05:27,399 INFO org.apache.solr.cloud.ElectionContext: canceling election /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
2015-03-05 22:05:27,400 WARN org.apache.solr.cloud.LeaderElector: 
org.apache.solr.common.SolrException: SolrCore not found:SolrUpgrade_collection_shard2_1_replica3
in []
	at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:211)
	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
	at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:273)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2015-03-05 22:05:27,415 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica3]
Closing main searcher on request.
2015-03-05 22:05:27,438 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
2015-03-05 22:05:27,456 INFO org.apache.solr.core.CachingDirectoryFactory: Closing HdfsDirectoryFactory
- 2 directories currently being tracked
2015-03-05 22:05:27,457 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index;done=false>>]
2015-03-05 22:05:27,457 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index
2015-03-05 22:05:27,470 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index
2015-03-05 22:05:27,470 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data;done=false>>]
2015-03-05 22:05:27,470 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
2015-03-05 22:05:27,470 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
2015-03-05 22:05:27,471 INFO org.apache.solr.core.CachingDirectoryFactory: Removing directory
after core close: hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
2015-03-05 22:05:27,490 INFO org.apache.solr.cloud.ElectionContext: canceling election /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
2015-03-05 22:05:27,491 WARN org.apache.solr.cloud.ElectionContext: cancelElection did not
find election node to remove /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
2015-03-05 22:05:27,493 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null
path=/admin/cores params={deleteInstanceDir=true&action=UNLOAD&core=SolrUpgrade_collection_shard2_1_replica3&wt=javabin&qt=/admin/cores&deleteDataDir=true&version=2}
status=0 QTime=100 
2015-03-05 22:05:27,741 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.SyncStrategy: Sync replicas to http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr/SolrUpgrade_collection_shard2_1_replica2/
2015-03-05 22:05:27,888 ERROR org.apache.solr.cloud.SyncStrategy: Sync Failed:org.apache.solr.common.cloud.ZooKeeperException:
Could not find collection in zk: SolrUpgrade_collection []
	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:600)
	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:588)
	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:583)
	at org.apache.solr.cloud.SyncStrategy.syncWithReplicas(SyncStrategy.java:152)
	at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:122)
	at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:101)
	at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:267)
	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
	at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:273)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.SyncStrategy: Leader's attempt to sync
with shard failed, moving to the next candidate
2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.ShardLeaderElectionContext: We failed sync,
but we have no versions - we can't sync in that case - we were active before, so become leader
anyway
2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.ShardLeaderElectionContext: I am the new
leader: http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr/SolrUpgrade_collection_shard2_1_replica2/
shard2_1
2015-03-05 22:05:27,888 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica2]
 CLOSING SolrCore org.apache.solr.core.SolrCore@595167
2015-03-05 22:05:27,889 INFO org.apache.solr.update.UpdateHandler: closing DirectUpdateHandler2{commits=1,autocommit
maxTime=60000ms,autocommits=0,soft autocommit maxTime=1000ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
2015-03-05 22:05:27,890 INFO org.apache.solr.update.SolrCoreState: Closing SolrCoreState
2015-03-05 22:05:27,890 INFO org.apache.solr.update.DefaultSolrCoreState: SolrCoreState ref
count has reached 0 - closing IndexWriter
2015-03-05 22:05:27,890 INFO org.apache.solr.update.DefaultSolrCoreState: closing IndexWriter
with IndexWriterCloser
2015-03-05 22:05:27,905 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica2]
Closing main searcher on request.
2015-03-05 22:05:27,911 INFO org.apache.solr.core.CachingDirectoryFactory: Closing HdfsDirectoryFactory
- 2 directories currently being tracked
2015-03-05 22:05:27,912 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index;done=false>>]
2015-03-05 22:05:27,912 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index
2015-03-05 22:05:27,924 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index
2015-03-05 22:05:27,924 INFO org.apache.solr.core.CachingDirectoryFactory: looking to close
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data;done=false>>]
2015-03-05 22:05:27,924 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
2015-03-05 22:05:27,925 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
2015-03-05 22:05:28,026 INFO org.apache.solr.core.CachingDirectoryFactory: Removing directory
after core close: hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
2015-03-05 22:05:28,155 INFO org.apache.solr.common.cloud.SolrZkClient: makePath: /collections/SolrUpgrade_collection/leaders/shard2_1
2015-03-05 22:05:28,287 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred
- updating... (live nodes size: 4)
{code}


This might be because of a missing check at line
https://github.com/apache/lucene-solr/blob/lucene_solr_4_10/solr/core/src/java/org/apache/solr/cloud/ElectionContext.java#L138
As the makePath creates all the missing directories recursively.

May be we can check whether the collection znode exists before we create the leader znode?
Thoughts? 




> Deleting a collection during leader election results in left over znodes in ZK 
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-7198
>                 URL: https://issues.apache.org/jira/browse/SOLR-7198
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.10.3
>            Reporter: Vamsee Yarlagadda
>
> I am seeing this issue while trying to run a collection delete operation in the middle
of leader election process.
> Contents of ZK (after issuing collection delete and waiting for some time)
> {code}
>   /
>     /aliases.json
>     /clusterstate.json
>     /collections
>       SolrUpgrade_collection
>         leaders
>            shard2_1
>     /configs
>     /live_nodes
>     /overseer
>     /overseer_elect
>     /solr
>     /solr.xml
> Contents of znode shard2_1:
> version0
> aversion0
> children_count0
> ctimeThu Mar 05 22:05:28 PST 2015 (1425621928169)
> cversion0
> czxid22815
> ephemeralOwner93427899815755800
> mtimeThu Mar 05 22:05:28 PST 2015 (1425621928169)
> mzxid22815
> pzxid22815
> dataLength194
> {
>   "core":"SolrUpgrade_collection_shard2_1_replica2",
>   "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
>   "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"
> }
> {code}
> Clusterstate.json before running a delete collection
> {code}
> {
>   "shards":{
>     "shard1":{
>       "range":"80000000-ffffffff",
>       "state":"active",
>       "replicas":{
>         "core_node1":{
>           "state":"active",
>           "core":"SolrUpgrade_collection_shard1_replica2",
>           "node_name":"search-testing-c5-ha-4.vpc.cloudera.com:8983_solr",
>           "base_url":"http://search-testing-c5-ha-4.vpc.cloudera.com:8983/solr",
>           "leader":"true"},
>         "core_node2":{
>           "state":"active",
>           "core":"SolrUpgrade_collection_shard1_replica1",
>           "node_name":"search-testing-c5-ha-2.vpc.cloudera.com:8983_solr",
>           "base_url":"http://search-testing-c5-ha-2.vpc.cloudera.com:8983/solr"}}},
>     "shard2_0":{
>       "range":"0-3fffffff",
>       "state":"active",
>       "replicas":{
>         "core_node5":{
>           "state":"active",
>           "core":"SolrUpgrade_collection_shard2_0_replica1",
>           "node_name":"search-testing-c5-ha-3.vpc.cloudera.com:8983_solr",
>           "base_url":"http://search-testing-c5-ha-3.vpc.cloudera.com:8983/solr",
>           "leader":"true"},
>         "core_node7":{
>           "state":"active",
>           "core":"SolrUpgrade_collection_shard2_0_replica2",
>           "node_name":"search-testing-c5-ha-4.vpc.cloudera.com:8983_solr",
>           "base_url":"http://search-testing-c5-ha-4.vpc.cloudera.com:8983/solr"}}},
>     "shard2_1":{
>       "range":"40000000-7fffffff",
>       "state":"active",
>       "replicas":{
>         "core_node8":{
>           "state":"active",
>           "core":"SolrUpgrade_collection_shard2_1_replica2",
>           "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
>           "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"},
>         "core_node9":{
>           "state":"active",
>           "core":"SolrUpgrade_collection_shard2_1_replica3",
>           "node_name":"search-testing-c5-ha-1.vpc.cloudera.com:8983_solr",
>           "base_url":"http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr"}}}},
>   "maxShardsPerNode":"10",
>   "router":"compositeId",
>   "replicationFactor":"2",
>   "autoAddReplicas":"false",
>   "routerSpec":{"name":"compositeId"}}}
> {code}
> As we can notice, shard (*shard2_1*) doesn't have any leader and i can see from the logs
that the replicas of the shard just started the leader election process. And here are the
Solr logs from one of the above replicas which eventually becomes the leader and registers
in ZK even though the collection was deleted.
> {code}
> 2015-03-05 22:05:25,383 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Running
the leader process for shard shard2_1
> 2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Checking
if I (core=SolrUpgrade_collection_shard2_1_replica2,coreNodeName=core_node8) should try and
be the leader.
> 2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: My last
published State was Active, it's okay to be the leader.
> 2015-03-05 22:05:25,387 INFO org.apache.solr.cloud.ShardLeaderElectionContext: I may
be the new leader - try and sync
> 2015-03-05 22:05:25,506 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state
change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has
occurred - updating... (live nodes size: 4)
> 2015-03-05 22:05:25,620 INFO org.apache.solr.core.SolrCore.Request: [SolrUpgrade_collection_shard2_1_replica2]
webapp=/solr path=/select params={q=*:*&distrib=false&wt=javabin&version=2} hits=118
status=0 QTime=1 
> 2015-03-05 22:05:25,623 INFO org.apache.solr.core.SolrCore.Request: [SolrUpgrade_collection_shard2_1_replica3]
webapp=/solr path=/select params={q=*:*&distrib=false&wt=javabin&version=2} hits=118
status=0 QTime=0 
> 2015-03-05 22:05:27,392 INFO org.apache.solr.cloud.ElectionContext: canceling election
/collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node8-n_0000000001
> 2015-03-05 22:05:27,393 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica3]
 CLOSING SolrCore org.apache.solr.core.SolrCore@559b52d3
> 2015-03-05 22:05:27,393 INFO org.apache.solr.update.UpdateHandler: closing DirectUpdateHandler2{commits=0,autocommit
maxTime=60000ms,autocommits=0,soft autocommit maxTime=1000ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
> 2015-03-05 22:05:27,394 INFO org.apache.solr.update.SolrCoreState: Closing SolrCoreState
> 2015-03-05 22:05:27,394 INFO org.apache.solr.update.DefaultSolrCoreState: SolrCoreState
ref count has reached 0 - closing IndexWriter
> 2015-03-05 22:05:27,394 INFO org.apache.solr.update.DefaultSolrCoreState: closing IndexWriter
with IndexWriterCloser
> 2015-03-05 22:05:27,397 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null
path=/admin/cores params={deleteInstanceDir=true&action=UNLOAD&core=SolrUpgrade_collection_shard2_1_replica2&wt=javabin&qt=/admin/cores&deleteDataDir=true&version=2}
status=0 QTime=5 
> 2015-03-05 22:05:27,399 INFO org.apache.solr.cloud.ShardLeaderElectionContext: Running
the leader process for shard shard2_1
> 2015-03-05 22:05:27,399 INFO org.apache.solr.cloud.ElectionContext: canceling election
/collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
> 2015-03-05 22:05:27,400 WARN org.apache.solr.cloud.LeaderElector: 
> org.apache.solr.common.SolrException: SolrCore not found:SolrUpgrade_collection_shard2_1_replica3
in []
> 	at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:211)
> 	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
> 	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
> 	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
> 	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
> 	at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:273)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-03-05 22:05:27,415 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica3]
Closing main searcher on request.
> 2015-03-05 22:05:27,438 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state
change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has
occurred - updating... (live nodes size: 4)
> 2015-03-05 22:05:27,456 INFO org.apache.solr.core.CachingDirectoryFactory: Closing HdfsDirectoryFactory
- 2 directories currently being tracked
> 2015-03-05 22:05:27,457 INFO org.apache.solr.core.CachingDirectoryFactory: looking to
close hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index;done=false>>]
> 2015-03-05 22:05:27,457 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index
> 2015-03-05 22:05:27,470 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data/index
> 2015-03-05 22:05:27,470 INFO org.apache.solr.core.CachingDirectoryFactory: looking to
close hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data;done=false>>]
> 2015-03-05 22:05:27,470 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
> 2015-03-05 22:05:27,470 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
> 2015-03-05 22:05:27,471 INFO org.apache.solr.core.CachingDirectoryFactory: Removing directory
after core close: hdfs://ns1/solr/SolrUpgrade_collection/core_node9/data
> 2015-03-05 22:05:27,490 INFO org.apache.solr.cloud.ElectionContext: canceling election
/collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
> 2015-03-05 22:05:27,491 WARN org.apache.solr.cloud.ElectionContext: cancelElection did
not find election node to remove /collections/SolrUpgrade_collection/leader_elect/shard2_1/election/93427899815755803-core_node9-n_0000000002
> 2015-03-05 22:05:27,493 INFO org.apache.solr.servlet.SolrDispatchFilter: [admin] webapp=null
path=/admin/cores params={deleteInstanceDir=true&action=UNLOAD&core=SolrUpgrade_collection_shard2_1_replica3&wt=javabin&qt=/admin/cores&deleteDataDir=true&version=2}
status=0 QTime=100 
> 2015-03-05 22:05:27,741 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state
change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has
occurred - updating... (live nodes size: 4)
> 2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.SyncStrategy: Sync replicas to http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr/SolrUpgrade_collection_shard2_1_replica2/
> 2015-03-05 22:05:27,888 ERROR org.apache.solr.cloud.SyncStrategy: Sync Failed:org.apache.solr.common.cloud.ZooKeeperException:
Could not find collection in zk: SolrUpgrade_collection []
> 	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:600)
> 	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:588)
> 	at org.apache.solr.common.cloud.ZkStateReader.getReplicaProps(ZkStateReader.java:583)
> 	at org.apache.solr.cloud.SyncStrategy.syncWithReplicas(SyncStrategy.java:152)
> 	at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:122)
> 	at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:101)
> 	at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:267)
> 	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
> 	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
> 	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
> 	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
> 	at org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:273)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.SyncStrategy: Leader's attempt to
sync with shard failed, moving to the next candidate
> 2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.ShardLeaderElectionContext: We failed
sync, but we have no versions - we can't sync in that case - we were active before, so become
leader anyway
> 2015-03-05 22:05:27,888 INFO org.apache.solr.cloud.ShardLeaderElectionContext: I am the
new leader: http://search-testing-c5-ha-1.vpc.cloudera.com:8983/solr/SolrUpgrade_collection_shard2_1_replica2/
shard2_1
> 2015-03-05 22:05:27,888 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica2]
 CLOSING SolrCore org.apache.solr.core.SolrCore@595167
> 2015-03-05 22:05:27,889 INFO org.apache.solr.update.UpdateHandler: closing DirectUpdateHandler2{commits=1,autocommit
maxTime=60000ms,autocommits=0,soft autocommit maxTime=1000ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
> 2015-03-05 22:05:27,890 INFO org.apache.solr.update.SolrCoreState: Closing SolrCoreState
> 2015-03-05 22:05:27,890 INFO org.apache.solr.update.DefaultSolrCoreState: SolrCoreState
ref count has reached 0 - closing IndexWriter
> 2015-03-05 22:05:27,890 INFO org.apache.solr.update.DefaultSolrCoreState: closing IndexWriter
with IndexWriterCloser
> 2015-03-05 22:05:27,905 INFO org.apache.solr.core.SolrCore: [SolrUpgrade_collection_shard2_1_replica2]
Closing main searcher on request.
> 2015-03-05 22:05:27,911 INFO org.apache.solr.core.CachingDirectoryFactory: Closing HdfsDirectoryFactory
- 2 directories currently being tracked
> 2015-03-05 22:05:27,912 INFO org.apache.solr.core.CachingDirectoryFactory: looking to
close hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index;done=false>>]
> 2015-03-05 22:05:27,912 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index
> 2015-03-05 22:05:27,924 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data/index
> 2015-03-05 22:05:27,924 INFO org.apache.solr.core.CachingDirectoryFactory: looking to
close hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data [CachedDir<<refCount=0;path=hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data;done=false>>]
> 2015-03-05 22:05:27,924 INFO org.apache.solr.core.CachingDirectoryFactory: Closing directory:
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
> 2015-03-05 22:05:27,925 INFO org.apache.solr.store.hdfs.HdfsDirectory: Closing hdfs directory
hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
> 2015-03-05 22:05:28,026 INFO org.apache.solr.core.CachingDirectoryFactory: Removing directory
after core close: hdfs://ns1/solr/SolrUpgrade_collection/core_node8/data
> 2015-03-05 22:05:28,155 INFO org.apache.solr.common.cloud.SolrZkClient: makePath: /collections/SolrUpgrade_collection/leaders/shard2_1
> 2015-03-05 22:05:28,287 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state
change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has
occurred - updating... (live nodes size: 4)
> {code}
> This might be because of a missing check at line
>  https://github.com/apache/lucene-solr/blob/372e8448021d00d3466b45da8a6e57736354eee8/solr/core/src/java/org/apache/solr/cloud/ElectionContext.java#L138
> As the makePath creates all the missing directories recursively.
> May be we can check whether the collection znode exists before we create the leader znode?
Thoughts? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message