directory-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: [ApacheDS] Entry.next=null, data[removeIndex] Please check that your keys are immutable, and that you have used synchronization properly
Date Thu, 24 Jul 2014 16:14:52 GMT
Le 24/07/2014 17:20, Foust, David M. (MSFC-IS80)[EAST] a écrit :
> I appreciate the rapid reply Emmanuel. I am personally interested in the new Mavibot
backend, but it would be a touch sell to the risk averse around here to use an alpha product.
You elude to issues with JDBM in your reply that are known. Can you give me a quick rundown
(or tell me good things to search on in the mailing list) ?  I just wondering if there are
things I can avoid doing to prevent the issue from happening. Also, what is the roadmap/timeline
for the Mavibot backend?
My last mail was a bit short.

Let me explain what's going on with JDBM extensively.

3 years ago, while I was doing some experiments for a talk I was giving,
I discovered with horror that doing a multi-threaded test :

for each thread do
    add 100 entries
    search all entries
    deleted the 100 created entries


(so each thread was reading their own 100 entries, plus the 1900 entries
created by the other threads).

The server crashed consistently after a few seconds.

What happened is that while one thread was doing a search, it grabbed an
index on which it was iterating to return each entry. At the very same
time, the other threads were modifying the index while adding and
removing their own entries ( which were part of the search). The index
wasn't protected against concurrent access, so the search crashed.

Even worse : the database was completely broken, and there was no other
way to get it fixed than reinjecting the full set of data :/

This was a really bad situation, that make ApacheDS absolutely unreliable.

We tried to fix that in two ways :
- first, we added some locks to protect the backend from concurrent
reads and writes. Basically, no read can be done while some writes are
done, and no write either. That is a serialization of writes, against
all other operations. Obviously, it comes with an extremelly heavy cost...
- second, we implemented transaction support for tables in indexes (that
was to handle the database corruption, by using a journal).

Those two big changes are still present in the server as of today, but
still, we have another issue which is hitting us from time to time, but
we don't have any clue about what can cause it : we still have some
corrupted database, and we can't explain it (this is what you are
experiencing). It's not frequent, but it's still heavilly problematic.
One of the possible explaination is that if the server is abruptily
stopped in the middle of a write operation, then some indexes might have
been updated but some other aren't (we don't have cross tables transaction).

Three years ago, some attempt was made to modify JDBM by adding MVCC to
it. To some extent, itw as working, but it came with a price we weren't
ready to pay : if any client wasn't closing the search it was using (and
we have no way to enforce that, except with a timeout), then the
in-memory journal was locked, waiting for some thread to release a
search os whatever operation done on the backend.
I spent countless hours chasing the missing close in our own tests (and
we have thousands...) in order to be able to pass the regression tests
due to this breach. And everytime someone injected some new code in the
server, with some new tests, I had to go through the exact same painful
exercice...

At some point, last year, we had to face the reality : JDBM was not
designed to support global transactions, and the way we implemented MVCC
was not going to fly, at all.

Here comes MVCC -and Mavibot-.

>From an historic point of view, we have discussed about using a MVCC
backend for years (back to 2006, when we discussed with Alex Karasulu
about CouchDB as a potential backend, when it was just exposed to the
world. Sadly, it was an Erlang code base). This matured for years until
we had to deal with this issue, but we always thought we coudl do it
later, buying some time by "fixing" JDBM (actually, we have fixed a few
issues in JDBM, and we have 2 forks of this project in our codebase).

But it was time to stop buying some time. Last year, after a lengthly
discussing with Howard Chu (OpenLDAP main architect), we decided to go
on with Mavibot. It was an experiment I started in an Apache Labs, in
parallel with my works on the server itself. After one year in the labs,
where Kiran joined to the effort, we decided it was time to move the
code to the server, and see it it was working at all, and what kind of
performances we can get out of it. The result was quite positif :
- first, it worked. That means all the regression tests were passing
with Mavibot as a backend;
- and second, it was more than 2 times faster (even for searches).

>From a technical point of view, Mavibot has the exact same features than
LMDB:
- thread safe
- transaction accross tables
- crash proof
(just listing the few critical features).


OTOH, Mavibot was not ready yet to be the backend of choice, up to now.
It does not (yet) support free pages recovering, which is a real burden
when you have many updates. We didn't have a working bulk load facility,
which was a real pain for two reasons :
- first, when it comes to inject many entries, it takes way longer using
the API than using a bulkload
- second, the fiule was growing fast, with no way to make it shrink

Those two issues are being working on actively. We now have a bulk load
system which still need to be improved (we are capable to bulk load
around 2500 entries per second, but we are memory bound atm), and we
have a working free page management which also need to be thoroughly
tested and improved (I know that Kiran is chasing a NPE when we update
the btree holding the free pages if we update it while processing the
free pages... So atm, we aren't uodating this btree, which works so far,
but we are losing some pages forever -) not a lot, but still -).


This is why I'm talking about a Alpha release, we do expect to get it
out quite quickly - 2 weeks ? One month ?)

What we expect to gain is larger than just having a faster backend :
- we will have a completely different cache too. JDBM is using its own
cache, we will leverage EhCache.
- we will be able to remove all the locks we have added all over the
server, regaining the concurrent updates and searches we have lost in
the process.
- we will have a safer crash proof server.
- if the server brutally crashes, we won't have to reindex anything, the
server will restart instanteanously, as we will start back from a
validated version, only losing the on-going update that was processed
during the crash.

All those features are just worthing the effort.

I hope I gave an extended explaination about what was going on, and that
you aren't afraid about the current state.

Keep in mind that this is quite a big piece of code (around 660k SLOCs,
with Studio), on which we are working for more than 10 years now (it all
started in 2003)


Mime
View raw message