lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [JENKINS] Solr-trunk - Build # 1865 - Failure
Date Fri, 25 May 2012 12:44:45 GMT

On May 25, 2012, at 8:11 AM, Sami Siren wrote:

> Just thinking out loud... shouldn't solr(cloud) manage such situation
> gracefully?

Currently, you can handle it gracefully if you up the graceful timeout in jetty. It's easy
enough to do that with the jetty we ship, but it's painful (extremely it seems) to do it in
tests.

In any case, I don't think it hurts anything practically? The merge thread fails, and so simply,
you don't get those merges I think? The problem with the tests is that the exception is thrown
from the merge thread. We have no affect on that from Solr - the test framework picks up an
uncaught exception in the thread, and our goose is cooked.

> I mean in real life solr instances can be killed or even
> whole servers can go away. Would it be ok to ignore that exception
> instead?

It's at the Lucene level really, so unless we try really hard to work around it, we would
have to figure out if something different made sense there I think.

Right now, if its waiting for merges to finish and gets interrupted, it throws an interrupted
exception. Unless we explicitly try and kill the current merge threads, I'd think that could
be a problem in any general code. You close the IW with wait for merges to finish = true,
then you start closing other resources, because you assume you are done with the IW, but in
fact merges can still be occurring if the thread was interrupted. And you might close resources
merging depends on (ie the directory).

Lucene does not like interruptions in other cases as well, but unfortunately, running in a
webapp, we can't easily always avoid them it seems.

> --
> Sami Siren
> 
> On Fri, May 25, 2012 at 3:01 PM, Mark Miller <markrmiller@gmail.com> wrote:
>> I actually know what this one is now.
>> 
>> Jetty is shutting down, and the graceful timeout is too low, and so jetty interrupts
the webapp, and while we are waiting for merges to finish on IW#close, an interrupt is thrown
and we stop waiting. So the directory is then closed out from under the merge thread. So really,
mostly a test issue it seems?
>> 
>> So I changed out jetty instances in tests to a 30 second graceful shutdown. Tests
went from 6 minutes for me, to 33 minutes. I won't make this fix for now :) One idea is to
perhaps do it just for this test - but even then it makes the test *much* longer, and there
is no reason it can't happen on other tests that use jetty instances. It just happens to only
show up in the test currently AFAICT.
>> 
>> On May 25, 2012, at 5:30 AM, Apache Jenkins Server wrote:
>> 
>>> Build: https://builds.apache.org/job/Solr-trunk/1865/
>>> 
>>> 1 tests failed.
>>> REGRESSION:  org.apache.solr.cloud.RecoveryZkTest.testDistribSearch
>>> 
>>> Error Message:
>>> Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread #2,6,]
>>> 
>>> Stack Trace:
>>> java.lang.RuntimeException: Thread threw an uncaught exception, thread: Thread[Lucene
Merge Thread #2,6,]
>>>       at com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>>>       at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>>>       at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>>>       at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>>>       at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>>>       at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>>>       at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>>>       at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>>>       at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>>>       at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>>>       at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>>       at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
>>>       at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
>>> Caused by: org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.store.AlreadyClosedException:
this Directory is closed
>>>       at __randomizedtesting.SeedInfo.seed([8B4A827F28B6F16]:0)
>>>       at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
>>>       at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480)
>>> Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is
closed
>>>       at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244)
>>>       at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241)
>>>       at org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:345)
>>>       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3031)
>>>       at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
>>>       at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
>>> 
>>> 
>>> 
>>> 
>>> Build Log (for compile errors):
>>> [...truncated 41930 lines...]
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message