lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Solr Replication Test Case Failure
Date Sun, 01 Aug 2010 01:06:28 GMT
On Sat, Jul 31, 2010 at 12:38 PM, Yonik Seeley
<yonik@lucidimagination.com> wrote:
> FYI, I'm now running this in a loop on my ubuntu box, without the
> retry-loop, trying to replicate a failure.

FYI, I've hit 3 failures so far... all of the form "Connection
refused"/"Jetty/Solr unresponsive", which may be related to SOLR-2019

-Yonik
http://www.lucidimagination.com


> -Yonik
> http://www.lucidimagination.com
>
> On Sat, Jul 31, 2010 at 11:52 AM, Yonik Seeley
> <yonik@lucidimagination.com> wrote:
>> OK, can you try to reproduce now?
>> Since the comments indicated that all the commits were to bump up the
>> index version number, I kept them all and just inserted an additional
>> commit in the query retry loop.
>>
>> But actually... there may still be a bug somewhere (even if this fixes
>> the test failures).
>> Each commit should wait for a new searcher to be registered before
>> returning... hence it should be impossible for overlapping warming
>> searchers to be responsible for the failure.  Hence when the test
>> fails, either the doc add, or the commit is failing.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Sat, Jul 31, 2010 at 11:35 AM, Yonik Seeley
>> <yonik@lucidimagination.com> wrote:
>>> Do the logs give any hints?
>>> Downside of only logging SEVERE is that it's much harder to
>>> investigate the cause of any intermittent failures that do happen.
>>>
>>> Looking at this test code, you shouldn't have to wait at all.  The
>>> test disables replication, indexes docs to the slave, commits (and
>>> waits for a new searcher to be registered), and then queries the
>>> slave.
>>>
>>> We should just remove that wait loop.
>>>
>>> Oh... i just figured it out while writing this I think...
>>>
>>>    index(slaveClient, "id", 551, "name", "name = " + 551);
>>>    slaveClient.commit(true, true);
>>>    index(slaveClient, "id", 552, "name", "name = " + 552);
>>>    slaveClient.commit(true, true);
>>>    index(slaveClient, "id", 553, "name", "name = " + 553);
>>>    slaveClient.commit(true, true);
>>>    index(slaveClient, "id", 554, "name", "name = " + 554);
>>>    slaveClient.commit(true, true);
>>>    index(slaveClient, "id", 555, "name", "name = " + 555);
>>>    slaveClient.commit(true, true);
>>>
>>> I bet that last commit can fail due to max warming searchers.
>>> I'll fix.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>> On Sat, Jul 31, 2010 at 8:41 AM, Mark Miller <markrmiller@gmail.com> wrote:
>>>>
>>>>
>>>>  This looks like it might actually be an issue - it fails once every 20
>>>> runs or so as a guess.
>>>>
>>>>   [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
>>>>    [junit] Testcase:
>>>> testReplicateAfterWrite2Slave(org.apache.solr.handler.TestReplicationHandler):
>>>> FAILED
>>>>    [junit] expected:<1> but was:<0>
>>>>    [junit] junit.framework.AssertionFailedError: expected:<1> but
was:<0>
>>>>    [junit]     at
>>>> org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave(TestReplicationHandler.java:464)
>>>>    [junit]
>>>>    [junit]
>>>>    [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 343.909
sec
>>>>
>>>> At first I tried to extend the wait for it, but that's obviously no help
>>>> - in this case the test failed after running for 343 seconds. I've seen it
as high as 968 seconds.
>>>>
>>>> - Mark
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message