directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: Replication reboot test failure : headsup
Date Sat, 03 Sep 2011 08:02:04 GMT
On 9/3/11 9:31 AM, Selcuk AYA wrote:
> On Fri, Sep 2, 2011 at 3:31 AM, Emmanuel Lecharny<elecharny@gmail.com>  wrote:
>> Hi guys,
>>
>> finally, I found the reason why this test was failing :
>> the SyncInfoValue is not a control, but a part of the IntermediateMessage.
>> It's value is stored into the IntermediateMessage value, and should be
>> decoded locally. It was the case a while back
>> (http://svn.apache.org/viewvc/directory/apacheds/trunk/protocol-ldap/src/main/java/org/apache/directory/server/ldap/replication/SyncReplConsumer.java?r1=1151533&r2=1152183&pathrev=1153306&diff_format=h),
>> but it was replaced by some code assuming the SyncInfoValue was a control.
>>
>> There was also another problem: the ads-repl ookie AT is an operatioanl
>> attribute, and the lookup we did was'nt returning it. This has been fixed.
>>
>> I have added a CONSUMER_LOG logger to help the analysis of replication. We
>> now have PROUCER_LOG for the producer and the same for the consumer.
>> Understanding what's going on at both side should be easier. (even f right
>> now, we don't have enough logs).
>>
>> Sadly, the reboot test seems to fail randomly. In Eclipse, I have it working
>> but it fails from time to time. It seems that we have some timing issue,
>> somewhere. If I set the logs, it does not fail anymore.
>>
> In my case these random failures were caused by failures during
> deserialization while reading from the event log. I created
> DIRSERVER-1653 and attached a diff that fixes the issue. Please try
> that patch. After this fix, testReboot passed both in eclipse and
> maven for me.

Meh. We were searching the cause of the failure with Kiran for more than 
two days. We found various bugs, but we focalized on everything but 
serialization until yesterday afternoon. I spent 4 hours trying to 
understand what was wrong, comparing all the writeExternal, isolating a 
test, decoding by end the byte array. Eventually, I crashed at 3 am with 
no explainaition, to wake up and find your mail.

Man, it fixes the test ! I could have spent a few more hours before 
finding this problem, I would not have thought about this problem...

Btw, there are other places in the code where we do a read() instead of 
a readFully. I'll fix them...


Many, many thanks !


-- 
Regards,
Cordialement,
Emmanuel L├ęcharny
www.iktek.com


Mime
View raw message