directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@apache.org>
Subject heads up about DIRSERVER-1377/1369
Date Sun, 19 Jul 2009 10:04:04 GMT
Hi,

so I think the issue is now fixed (crossing fingers). I write this mail 
in order to give some info about what has been done.

First, I would like to thank Kiran who seconded me during the long 
process of trying to get the bug fixed. I would like to thank Sumit 
Goyal and Murali Krishna who found the bug and submitted a test to 
exhibit the problem. It was incredibly useful.

It took me a month - even if I didn't worked only on this problem, as I 
also had to deal with some client at the same time -, but I can tell 
that it was the most complicated problem I have ever had to work on. I'm 
not 100% sure what exactly was the problem, I'm afraid that I just put 
it aside when I removed the AvlTree implentation to replace it with some 
simpler data structure.

When the issue was first created, I thought it was a problem with MINA, 
as we had a concurrency issue with version 2.0.0-M4. Then I bumped up 
the version to 2.0.0-M5, and closed the initial issue. Sadly, after some 
more tests, another issue showed up, with a different error message, and 
the JIRA was reopened.

A test was provided, which was the most useful thing possible, as it 
helped me to reproduce the error on my computer. We started some more 
debugging session, and the problem occurred quite immediately.

But first, we have seen another problem : an index was growing without 
bound (SubLevel). We debugged the server with Kiran, and found the 
reason. At least, it was easy to fix, even if it wasn't related to the 
initial issue. Lesson learned : when something is wrong, it's very 
likely that while investigating the problem, you find some other 
problems around :). The bad news is that many parts of the server need 
more thorough tests :/

While reviewing the index handling, as the problem seemed to be in this 
area (some few stackTrace helped to show that, at least : sometime old 
school debugging with printf come to the rescue... ), we also cleaned 
many small things, like useless deletion of non existing elements in index.

Then we started to think about the way the server stores data on disk. 
Basically, you have N threads updating many index and the master table, 
and all those data read and write is done in a thread-safe part of the 
server (Jdbm backend). As every access to JDBM is synchronized, we 
discarded the idea that JDBM was buggy at this point. It narrowed the 
issue to what is in between the interceptor chain and JDBM.

Then we saw that the place we massage index was not thread safe. In 
other words, as we have to update around 10 index when updating an 
entry, it must be done in a way that guarantee that the index can't be 
fooled by another thread. This was not the case. We then thought we 
found the reason of the concurrent problem. We fixed that by adding a 
hell of synchronized all over this part of the code.

Great success ! When the multithreaded test was launched, instead of 
failing after a few hundreds of updates, it was now running well for 
thousands of updates (I went up to 700 000 updates before getting a 
breakage).

That was the good news. The bad news was that the problem was still 
around, but deeper. At least, we know for sure that we removed a real 
concurrent issue in the code. So what could be the problem ?

When we are storing data in indexes, we store one key associated to many 
values (for instance, an ObjectClass can be related to many entries). In 
order to manage this, we use a ordered data structure to store those 
values. Namely, depending on the number of values, either an AvlTree or 
a BTree. This AvlTree has to be serialized and deserialized in order to 
be stored on disk. This is where I thought we could have an issue : the 
de/serialization was done outside of the synchronised part, which is not 
a problem per se, but we send and get back a byte[] from JDBM. What is 
fishy here is that JDBM caches some data, and returns a *reference* to 
the data . Getting a reference on a byte[] means that some other thread 
can perfectly modify this byte[] at the same time, with a potential 
impact on the expected result. It sounded like a good catch, so I 
modified JDBM to get it return a copy of the byte[]. It didn't changed 
anything, I still get the error after a few hundred thousands updates.

Next step was to start suspecting the AvlTree implementation. I decided 
that rewriting this part was a sane idea. What a fool ... AvlTree is a 
complex data structure, and writing an iterative implementation needs 
time, and a hell of tests. Basically, as I was stuck with no other 
ideas, I followed this idea, even if it was crazy. (some little voice in 
my head was telling me I'm not any more a student, and this was a lost 
of time, but the other side of my personality told me that I was smart 
enough to do better than what we have ...). The bad thing about Avl 
trees is that documentation about it is *very* rough on the internet. 
There are some C implementations, but they use pointers, something we 
don't have in Java, making it quite complex to implement. We also have 
some specific needs, like we want to be able to move backward and 
forward in the tree, adding more constraints.

After around 2 weeks and a half, having a partially working AVLTree 
impelmentation, I just had a better idea. In the mean time, Kiran has 
implemented an in-jdbm serialization, removing the previous potential 
problem with byte[] copies.

So the new idea was that we don't need AvlTree, we just need a sorted 
array, which was less costly when searching data into it (O(N) for a 
sorted array, compared to a 1.44O(N) for an AvlTree). Sure, insertion 
and removal was more costly, as it needs an array copy, but as we don't 
do a lot of addition or removal, that was a good trade. More important, 
it save a lot of memory, and serialization was way easier (memory shring 
by 4, for references to an entry, as we have less pointers (left and 
right child, previous and next leaf, and the balance)). Less memory 
consumption means more cache, less garbage collection, less pointer 
update, faster read and write on disk.

I ended by writing this ArrayTree implementation, which took me three 
days, with all the necessary tests, and a bit of problems with the 
current Cursor interface (there are some semantic I really don't like in 
this area). Then I just had to substitute AvlTree with ArrayTree, and voilà.

Tests went up to 2 000 000 updates, without any problem. Twice. So I 
closed the issue.

My understanding is that the previous AvlTree had a problem in the way 
it managed next/previous index. I didn't spent time in analysing the 
code, it was too complex. Last, not least, I added a lot of debug logs 
in the server, and tried to see what the logs gave. That led me to 
grep/sed a few gigabytes of logs, with little success. It just sucked 
time, with no result except to eliminate other options.

Last, not least, what made me thing that the bug was a problem in 
AvlTree in some very specific cases, is that we always had a NPE in the 
very same line, and that the analyze of the byte[] taken from the JDBM 
backend shown that there were some missing elements in the serialized 
tree. As the tree was correctly serialized (I added some check in the 
serialize() method : the byte[] was deserialized before being stored in 
JDBM to be sure that the serialization was ok, and I had no problem in 
it), that mean something was broken while rebuilding the AvlTree.

It now seems to work, but I'm not 100% sure the problem is really fixed. 
I think I just pushed some dust in the trash bin, expecting that it was 
not pushing them under the rug...

In any case, this part of the server needs a deep check, as I saw many 
potential problems that needs to be addressed. We probably have to run 
this multi-threaded test on a more powerful computer, to see if it is 
really fixed.


-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org



Mime
View raw message