lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-9836) Add more graceful recovery steps when failing to create SolrCore
Date Fri, 03 Mar 2017 21:18:45 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895044#comment-15895044
] 

Steve Rowe edited comment on SOLR-9836 at 3/3/17 9:17 PM:
----------------------------------------------------------

{{MissingSegmentRecoveryTest.testLeaderRecovery()}} has been failing pretty regularly on Jenkins.
 Something happened on or about February 10th, when the probability of failure went up considerably
(and has since remained at this elevated level).

I got 3 failures beasting 100 iterations of the test suite using Miller's beasting script
on my box.  However, for the past three weeks I've gotten several failures a day on my Jenkins,
and roughly once a day on either ASF or Policeman Jenkins.

Here's a recent failure [https://builds.apache.org/job/Lucene-Solr-Tests-master/1699/]:

{noformat}
  [junit4]   2> 599977 ERROR (coreLoadExecutor-3254-thread-1-processing-n:127.0.0.1:41308_solr)
[n:127.0.0.1:41308_solr c:MissingSegmentRecoveryTest s:shard1 r:core_node1 x:MissingSegmentRecoveryTest_shard1_replica2]
o.a.s.u.SolrIndexWriter Error closing IndexWriter
  [junit4]   2> java.nio.file.NoSuchFileException: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index.20170228030909468/write.lock
  [junit4]   2> 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
  [junit4]   2> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
  [junit4]   2> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
  [junit4]   2> 	at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
  [junit4]   2> 	at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
  [junit4]   2> 	at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
  [junit4]   2> 	at java.nio.file.Files.readAttributes(Files.java:1737)
  [junit4]   2> 	at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:177)
  [junit4]   2> 	at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:67)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4698)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3093)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3227)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1136)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1179)
  [junit4]   2> 	at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:291)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:728)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:911)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:828)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.processCoreCreateException(CoreContainer.java:1011)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:939)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.lambda$load$3(CoreContainer.java:572)
  [junit4]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [junit4]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[...]
  [junit4]   2> 600005 ERROR (coreContainerWorkExecutor-3250-thread-1-processing-n:127.0.0.1:41308_solr)
[n:127.0.0.1:41308_solr    ] o.a.s.c.CoreContainer Error waiting for SolrCore to be created
  [junit4]   2> java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException:
Unable to create core [MissingSegmentRecoveryTest_shard1_replica2]
  [junit4]   2> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [junit4]   2> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.lambda$load$4(CoreContainer.java:600)
  [junit4]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [junit4]   2> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [junit4]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
  [junit4]   2> Caused by: org.apache.solr.common.SolrException: Unable to create core
[MissingSegmentRecoveryTest_shard1_replica2]
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:952)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.lambda$load$3(CoreContainer.java:572)
  [junit4]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [junit4]   2> 	... 5 more
  [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:964)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:828)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.processCoreCreateException(CoreContainer.java:1011)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:939)
  [junit4]   2> 	... 7 more
  [junit4]   2> 	Suppressed: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 		at org.apache.solr.core.SolrCore.<init>(SolrCore.java:964)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.<init>(SolrCore.java:828)
  [junit4]   2> 		at org.apache.solr.core.CoreContainer.create(CoreContainer.java:937)
  [junit4]   2> 		... 7 more
  [junit4]   2> 	Caused by: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 		at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2005)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2125)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1053)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.<init>(SolrCore.java:937)
  [junit4]   2> 		... 9 more
  [junit4]   2> 	Caused by: org.apache.lucene.index.CorruptIndexException: Unexpected file
read error while reading index. (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index/segments_2")))
  [junit4]   2> 		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286)
  [junit4]   2> 		at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938)
  [junit4]   2> 		at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:125)
  [junit4]   2> 		at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
  [junit4]   2> 		at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:240)
  [junit4]   2> 		at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:114)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1966)
  [junit4]   2> 		... 12 more
  [junit4]   2> 	Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index/segments_2")
  [junit4]   2> 		at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75)
  [junit4]   2> 		at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
  [junit4]   2> 		at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
  [junit4]   2> 		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:296)
  [junit4]   2> 		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
  [junit4]   2> 		... 18 more
  [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2005)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2125)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1053)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:937)
  [junit4]   2> 	... 10 more
  [junit4]   2> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments*
file found in LockValidatingDirectoryWrapper(MMapDirectory@/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index.20170228030909468
lockFactory=org.apache.lucene.store.NativeFSLockFactory@74782755): files: [write.lock]
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:933)
  [junit4]   2> 	at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:125)
  [junit4]   2> 	at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
  [junit4]   2> 	at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:240)
  [junit4]   2> 	at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:114)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1966)
[...]
  [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=MissingSegmentRecoveryTest -Dtests.method=testLeaderRecovery
-Dtests.seed=B800C15EC6F11C02 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=fi-FI
-Dtests.timezone=Asia/Famagusta -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
  [junit4] FAILURE 94.6s J2 | MissingSegmentRecoveryTest.testLeaderRecovery <<<
  [junit4]    > Throwable #1: java.lang.AssertionError: Expected a collection with one
shard and two replicas
  [junit4]    > null
  [junit4]    > Last available state: DocCollection(MissingSegmentRecoveryTest//collections/MissingSegmentRecoveryTest/state.json/6)={
  [junit4]    >   "replicationFactor":"2",
  [junit4]    >   "shards":{"shard1":{
  [junit4]    >       "range":"80000000-7fffffff",
  [junit4]    >       "state":"active",
  [junit4]    >       "replicas":{
  [junit4]    >         "core_node1":{
  [junit4]    >           "core":"MissingSegmentRecoveryTest_shard1_replica2",
  [junit4]    >           "base_url":"https://127.0.0.1:41308/solr",
  [junit4]    >           "node_name":"127.0.0.1:41308_solr",
  [junit4]    >           "state":"down"},
  [junit4]    >         "core_node2":{
  [junit4]    >           "core":"MissingSegmentRecoveryTest_shard1_replica1",
  [junit4]    >           "base_url":"https://127.0.0.1:60247/solr",
  [junit4]    >           "node_name":"127.0.0.1:60247_solr",
  [junit4]    >           "state":"active",
  [junit4]    >           "leader":"true"}}}},
  [junit4]    >   "router":{"name":"compositeId"},
  [junit4]    >   "maxShardsPerNode":"1",
  [junit4]    >   "autoAddReplicas":"false"}
  [junit4]    > 	at __randomizedtesting.SeedInfo.seed([B800C15EC6F11C02:E855595D9FD0AA1F]:0)
  [junit4]    > 	at org.apache.solr.cloud.SolrCloudTestCase.waitForState(SolrCloudTestCase.java:265)
  [junit4]    > 	at org.apache.solr.cloud.MissingSegmentRecoveryTest.testLeaderRecovery(MissingSegmentRecoveryTest.java:105)
[...]
  [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {_version_=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128))),
id=FST50}, docValues:{}, maxPointsInLeafNode=1106, maxMBSortInHeap=6.191537660994534, sim=RandomSimilarity(queryNorm=true):
{}, locale=fi-FI, timezone=Asia/Famagusta
  [junit4]   2> NOTE: Linux 3.13.0-85-generic amd64/Oracle Corporation 1.8.0_121 (64-bit)/cpus=4,threads=1,free=138683768,total=527433728
{noformat}



was (Author: steve_rowe):
{{MissingSegmentRecoveryTest.testLeaderRecovery()}} has been failing pretty regularly on Jenkins.
 Something happened on or about February 10th, when the probability of failure went up considerably
(and has since remained at this elevated level).

I got 3 failures beasting 100 iterations of the test suite using Miller's beasting script
on my box.  However, for the past three weeks I've see this several times a day on my Jenkins,
and roughly once a day on either ASF or Policeman Jenkins.

Here's a recent failure [https://builds.apache.org/job/Lucene-Solr-Tests-master/1699/]:

{noformat}
  [junit4]   2> 599977 ERROR (coreLoadExecutor-3254-thread-1-processing-n:127.0.0.1:41308_solr)
[n:127.0.0.1:41308_solr c:MissingSegmentRecoveryTest s:shard1 r:core_node1 x:MissingSegmentRecoveryTest_shard1_replica2]
o.a.s.u.SolrIndexWriter Error closing IndexWriter
  [junit4]   2> java.nio.file.NoSuchFileException: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index.20170228030909468/write.lock
  [junit4]   2> 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
  [junit4]   2> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
  [junit4]   2> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
  [junit4]   2> 	at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
  [junit4]   2> 	at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
  [junit4]   2> 	at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
  [junit4]   2> 	at java.nio.file.Files.readAttributes(Files.java:1737)
  [junit4]   2> 	at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:177)
  [junit4]   2> 	at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:67)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4698)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3093)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3227)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1136)
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1179)
  [junit4]   2> 	at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:291)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:728)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:911)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:828)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.processCoreCreateException(CoreContainer.java:1011)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:939)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.lambda$load$3(CoreContainer.java:572)
  [junit4]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [junit4]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[...]
  [junit4]   2> 600005 ERROR (coreContainerWorkExecutor-3250-thread-1-processing-n:127.0.0.1:41308_solr)
[n:127.0.0.1:41308_solr    ] o.a.s.c.CoreContainer Error waiting for SolrCore to be created
  [junit4]   2> java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException:
Unable to create core [MissingSegmentRecoveryTest_shard1_replica2]
  [junit4]   2> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  [junit4]   2> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.lambda$load$4(CoreContainer.java:600)
  [junit4]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  [junit4]   2> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  [junit4]   2> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  [junit4]   2> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  [junit4]   2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
  [junit4]   2> Caused by: org.apache.solr.common.SolrException: Unable to create core
[MissingSegmentRecoveryTest_shard1_replica2]
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:952)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.lambda$load$3(CoreContainer.java:572)
  [junit4]   2> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  [junit4]   2> 	... 5 more
  [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:964)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:828)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.processCoreCreateException(CoreContainer.java:1011)
  [junit4]   2> 	at org.apache.solr.core.CoreContainer.create(CoreContainer.java:939)
  [junit4]   2> 	... 7 more
  [junit4]   2> 	Suppressed: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 		at org.apache.solr.core.SolrCore.<init>(SolrCore.java:964)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.<init>(SolrCore.java:828)
  [junit4]   2> 		at org.apache.solr.core.CoreContainer.create(CoreContainer.java:937)
  [junit4]   2> 		... 7 more
  [junit4]   2> 	Caused by: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 		at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2005)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2125)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1053)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.<init>(SolrCore.java:937)
  [junit4]   2> 		... 9 more
  [junit4]   2> 	Caused by: org.apache.lucene.index.CorruptIndexException: Unexpected file
read error while reading index. (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index/segments_2")))
  [junit4]   2> 		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:286)
  [junit4]   2> 		at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938)
  [junit4]   2> 		at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:125)
  [junit4]   2> 		at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
  [junit4]   2> 		at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:240)
  [junit4]   2> 		at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:114)
  [junit4]   2> 		at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1966)
  [junit4]   2> 		... 12 more
  [junit4]   2> 	Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index/segments_2")
  [junit4]   2> 		at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:75)
  [junit4]   2> 		at org.apache.lucene.store.BufferedChecksumIndexInput.readByte(BufferedChecksumIndexInput.java:41)
  [junit4]   2> 		at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
  [junit4]   2> 		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:296)
  [junit4]   2> 		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
  [junit4]   2> 		... 18 more
  [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
  [junit4]   2> 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2005)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2125)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1053)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:937)
  [junit4]   2> 	... 10 more
  [junit4]   2> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments*
file found in LockValidatingDirectoryWrapper(MMapDirectory@/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-master/solr/build/solr-core/test/J2/temp/solr.cloud.MissingSegmentRecoveryTest_B800C15EC6F11C02-001/tempDir-001/node2/MissingSegmentRecoveryTest_shard1_replica2/data/index.20170228030909468
lockFactory=org.apache.lucene.store.NativeFSLockFactory@74782755): files: [write.lock]
  [junit4]   2> 	at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:933)
  [junit4]   2> 	at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:125)
  [junit4]   2> 	at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
  [junit4]   2> 	at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:240)
  [junit4]   2> 	at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:114)
  [junit4]   2> 	at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1966)
[...]
  [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=MissingSegmentRecoveryTest -Dtests.method=testLeaderRecovery
-Dtests.seed=B800C15EC6F11C02 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=fi-FI
-Dtests.timezone=Asia/Famagusta -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
  [junit4] FAILURE 94.6s J2 | MissingSegmentRecoveryTest.testLeaderRecovery <<<
  [junit4]    > Throwable #1: java.lang.AssertionError: Expected a collection with one
shard and two replicas
  [junit4]    > null
  [junit4]    > Last available state: DocCollection(MissingSegmentRecoveryTest//collections/MissingSegmentRecoveryTest/state.json/6)={
  [junit4]    >   "replicationFactor":"2",
  [junit4]    >   "shards":{"shard1":{
  [junit4]    >       "range":"80000000-7fffffff",
  [junit4]    >       "state":"active",
  [junit4]    >       "replicas":{
  [junit4]    >         "core_node1":{
  [junit4]    >           "core":"MissingSegmentRecoveryTest_shard1_replica2",
  [junit4]    >           "base_url":"https://127.0.0.1:41308/solr",
  [junit4]    >           "node_name":"127.0.0.1:41308_solr",
  [junit4]    >           "state":"down"},
  [junit4]    >         "core_node2":{
  [junit4]    >           "core":"MissingSegmentRecoveryTest_shard1_replica1",
  [junit4]    >           "base_url":"https://127.0.0.1:60247/solr",
  [junit4]    >           "node_name":"127.0.0.1:60247_solr",
  [junit4]    >           "state":"active",
  [junit4]    >           "leader":"true"}}}},
  [junit4]    >   "router":{"name":"compositeId"},
  [junit4]    >   "maxShardsPerNode":"1",
  [junit4]    >   "autoAddReplicas":"false"}
  [junit4]    > 	at __randomizedtesting.SeedInfo.seed([B800C15EC6F11C02:E855595D9FD0AA1F]:0)
  [junit4]    > 	at org.apache.solr.cloud.SolrCloudTestCase.waitForState(SolrCloudTestCase.java:265)
  [junit4]    > 	at org.apache.solr.cloud.MissingSegmentRecoveryTest.testLeaderRecovery(MissingSegmentRecoveryTest.java:105)
[...]
  [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {_version_=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128))),
id=FST50}, docValues:{}, maxPointsInLeafNode=1106, maxMBSortInHeap=6.191537660994534, sim=RandomSimilarity(queryNorm=true):
{}, locale=fi-FI, timezone=Asia/Famagusta
  [junit4]   2> NOTE: Linux 3.13.0-85-generic amd64/Oracle Corporation 1.8.0_121 (64-bit)/cpus=4,threads=1,free=138683768,total=527433728
{noformat}


> Add more graceful recovery steps when failing to create SolrCore
> ----------------------------------------------------------------
>
>                 Key: SOLR-9836
>                 URL: https://issues.apache.org/jira/browse/SOLR-9836
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Mike Drob
>            Assignee: Mark Miller
>             Fix For: 6.5, master (7.0)
>
>         Attachments: SOLR-9836.patch, SOLR-9836.patch, SOLR-9836.patch, SOLR-9836.patch,
SOLR-9836.patch, SOLR-9836.patch, SOLR-9836.patch
>
>
> I have seen several cases where there is a zero-length segments_n file. We haven't identified
the root cause of these issues (possibly a poorly timed crash during replication?) but if
there is another node available then Solr should be able to recover from this situation. Currently,
we log and give up on loading that core, leaving the user to manually intervene.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message