directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng, Kai" <kai.zh...@intel.com>
Subject RE: Rethinking Mavibot...
Date Mon, 27 Jun 2016 10:49:23 GMT
Thanks so much for the full explanation, Emmanuel. Sorry for my asking again. 

>> Otherwise, we could use LMDB, with a JNI wrapper. That is an option. But I have no
idea what it would cost us in term of packaging.
The packaging isn't really so much a problem. There is a nice way handling this: the binary
package could be put into the jar just as a resource and while starting up it can be dynamically
extracted, putted into a runtime folder to be loaded. Many Java projects do this way. 

If we could go this way, I would love to do this and support in other aspects as well as possible,
because it would sound very promising to come up a high performance and high reliability LDAP
server for the Apache world.

Regards,
Kai

-----Original Message-----
From: Emmanuel Lécharny [mailto:elecharny@gmail.com] 
Sent: Monday, June 27, 2016 6:03 PM
To: Apache Directory Developers List <dev@directory.apache.org>
Subject: Re: Rethinking Mavibot...

Le 27/06/16 à 08:07, Zheng, Kai a écrit :
> Thanks for the update.
>
> It looks like to me there is much work to do. Is there any alternative option? I'm still
thinking that if we could leverage any existing back end implementation, so we could focus
on the LDAP specific logic for the master server component...this is worth being considered
because in today's industry there are so many B-TREE's implementations already.
I think we already discussed that matter months ago. I also think that many don't understand
why we *need* somthing like Mavibot. But let me try to explain again...


Back in 2006, we knew that we were going to have troubles with our choice (JDBM). Back then,
we had little choice though :
- JDBM was the only open source, license compatible B-tree implementation in Java available.
- We had other more important issues to cope with.

However, during the Austin Apache Conference, during which CiuchDB was announced, we had a
long discussion with Alex, Pierre-Arnaud and Ersin about the fact that we would need a MVCC
based backend. Sadly, CouchDB was written in Erlang, so we had to wait.

We waited until 2011, where it appears that concurrent searches and updates would eventually
generate errors (typically, some searches would fail). We added a hell lot of locks, up to
the point it was impossible to do a search while doing an update, which was a very expensive
penalty to pay. At teh same time, we started to look at alternatives, that does not include
a rewite. Some guy started to implement MVCC on top of JDBM, but the result was not pleasant
: if for any reason you forgot to close a cursor, the server would go west in a matter of
minute. We can't forces the client to carrefully close their cursor, it was simply not an
option, so we ditched the work.

What alternative did we have ? Not so much : Berkeley DB has been bought by Oracle, and the
JE wasn't available with a compatible license. And as of today, there aren't any MVCC B-Tree
implementation that I know of, with a compatible license. So we are in a kind of dead lock.

Funny enough, at the very same time, OpenLDAP has started to work on the exact same piece
of code, for the exact same reason (BDB has changed its license, and some data corruption
could occur under certain circonstances, requiering a tool to repair the database). So we
new we weren't in bad company !

Bottom line, I started to work on a replacement for JDBM, which get pushed in the repository
on january 2012 ( I started to work on that in the mid 2011). Kiran ported ApacheDS to use
Mavibot as a backend around Srping 2013, and we now have an ApacheDS server that *works* with
Mavibot. Not only that, but it's also faster than JDBM.

Is it enough ? No. For one single reason : Mavibot with no transaction support won't be any
better than JDBM, for the exact same reasons : if we have a crash, we will potentially ends
with a corrupted database (less often than JDBM but still). It's way better though because
we can't have a failure during a search while updates are done, and courruption could be fixed
easily.

Mavibot brings some other extra bonuses : we now can inject data in bulk mode, which is orders
of magnitude faster than adding data when the server is up and running.


Otherwise, we could use LMDB, with a JNI wrapper. That is an option. But I have no idea what
it would cost us in term of packaging. Right now, ApacheDS comes as a bundled package, or
as an installer for Linux, Mac OSX and Windows. Having a dependency on a binary component
might be a real trouble when it comes to package it properly. ATM, I'm not willing to spend
some time on this aspect.

Last, not least, Mavibot is *NOT* a B-tree implementation. It's a MVCC (Multi-Version Concurrency
Control B-tree implementation
(https://en.wikipedia.org/wiki/Multiversion_concurrency_control) which is *VERY* different.
The critical aspect is the MVCC part, this is what guarantees consistancy, and lock free access
to the underlying database.

What we are lacking atm, is the cross B-trees transaction support. This is what will bring
two critical improvements to the ApacheDS server :

1) No need to implement a mechanism to restore a database if it crashes in the middle of an
update (a LDAP update requires multiple updates to multiple indexes - typically 10 minimum,
with some indexes being updated more than once, like the RDN index -).
2) Speed ! During a transaction, we work in memory, until we are done (with a commit or an
abort). That saves us multiple updates on disk.
Typically, we would save 50% of the writes for a single Add operation.
That means an Add would be twice faster.


I hope this clarify the reason why we started to develop Mavibot, even though it's not going
as fast as it should (well, at some point, we have a life, and a day job, that both don't
let us work as much as we would like on our favorite project).

I would end by telling everyone that this is an Open Source project, and anyone is greatly
welcome if they want to give an hand...

Thanks for 'listening'.
Mime
View raw message