directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <>
Subject Bulkloader status
Date Tue, 11 Nov 2014 01:28:14 GMT
Hi guys,

I have been pretty busy those last 2 weeks working on Fortress
integration and on the LDAP API release (beside other things).

As of today, the mavibot bulkoader, which has been put aside for days
with a few pieces of the algorithm to implement, is working. I have
tested it with 1 000 000 elements, and it's all good. (It also has been
tested with many btrees with incremental sizes to be sure we don't
forget a few corner cases).

The goals where :
- have the bulkloader part of Mavibot, instead of having it in
Mavibot-Partition  only.
- have it accept as many entries as possible. The previous
implementation was not capable to handle millions of elements
- have it use a limited amoubt of memory : we now keep only one page per
btree level

In order to cope with the potential huge number of elements we have to
sort before loading them in the btree, we use temporary files, which can
hold N sorted elements. Then we do a kind of merge-sort, by pulling the
right element from one of the files. One can configurate the number of
elements in each file, assuming we sort them in memory. On my tests,
above 16 384 elements per file, I don't see a huge improvement.

The perf tests I have done on my laptop show that I can load up to 56
600 tuples per second. Don't expect the same performances when it will
come to load LDAP entries in the server ! I suspect that it will be 10
times slower (still, 5 000 entries per second added would be a great
imrporvement over what we have now).

There are some steps that need to be fulfilled still :
- multi-values support : it's all aboyt bulkloading the values when we
have many. Should be easy to implement.
- use the bulkloader in ApacheDS mavibot-partition
- add a CLI for the mavibot bulkloader and the mavibot-partition bulkloader.
- add a in-memory bulkloader.
- cleanup the code which has many redonduncies atm.

Anyway, it's making progress. I'll probably cut a mavibot release
tomorrow, which will allow me to cut an ApacheDS release too.

thanks !

View raw message