directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Howard Chu <...@symas.com>
Subject Re: Bulk Loader : some ideas...
Date Sun, 17 Aug 2014 21:44:09 GMT
Emmanuel Lécharny wrote:
> Le 17/08/14 22:05, Howard Chu a écrit :
>> Emmanuel Lécharny wrote:
>>> Le 17/08/14 17:07, Howard Chu a écrit :
>>>> If we encounter an entry later in the LDIF that corresponds to one of
>>>> these missing DNs, the search in the RDN index will just return the
>>>> entryID we already assigned to it. We then remove the DN from the
>>>> missing DN list. The result is that the DB tables and entryIDs are
>>>> generated in DN order even if the entries aren't ordered in the LDIF.
>>>
>>> The pb with this approach is that you lose the EntryUUID stored in the
>>> LDIF file (typically when you try to bulk load an extract done from a
>>> replica : you want to keep this information).
>>
>> So create a stub entry with a provisional entryUUID, and overwrite the
>> stub entry with the real entryUUID if you encounter the real entry
>> later. Still far cheaper than multiple passes thru the LDIF file.
>
> It works in your case, not ours. Again, we need to order the master
> table using the entry's UUID, as we also need to create the RDN index at
> the same time. We can't pull one entry after the other and push them
> into the master table, creating emtpy entries when we have missing
> parents, it's just won't produce an ordered master table (the master
> table is a Btree<UUID, Entry>).

Then delete the stub entry and insert the new entry.

> Obviously, if we had a side index for
> UUID, pointing to offset to entries in the file, that would be a
> different story (but we would still have to order the UUID index
> seprarately, as a whole).
>
> This is the reason who have two phases.

This sounds broken to me; that means if you try to load an LDIF from some 
other software that also includes entryUUIDs, but which are not generated in 
the order that you use, your master table will be in the wrong order.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Mime
View raw message