directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: [ApacheDS] [JDBM Partition] Why it's a BAD idea to store the Entry + DN in the master table
Date Thu, 07 Aug 2008 07:06:30 GMT
Alex Karasulu wrote:
> Hi all,
>   
Hi Alex,
> The ServerEntry stores the DN of the entry.  I think this is good for better
> code organization.  However, storing the entry together with it's DN into
> the master table is a very bad idea.  The DN should instead be managed in
> the NDN and DN indices.
>   
I think you are wrong. Storing the DN within the entyr is a very good 
idea (tm) :)

And the DN should also be managed in the NDN and DN indice.
> The reason why I'm suggesting this is because modifyDN operations will be
> extremely cumbersome when performed on a DN with many children.
ModifyDN operation will be slow. So what ? How many ModifyDN will occurs 
compare to Search operations ? Storing DN within entires was 
specifically done in order to speedup the search operation, as it allows 
us to return a result in 2 accesses to the backend :
- an indice access (and it can be the DN indice, this is why we need 
it), or any other attribute
- and an access to the master table

When we didn't had this DN stored within th entry, we had one more 
access, to the DN index, just because we had to get the DN to return 
back an entry.

This was a costly operation, because we had to do log(N) comparison of 
DN (N = numbers of entry).

So, yes, ModifyDN has been slowed down big time, for the benefit of all 
the searches, something I personally want to pay the price.
>   It will
> require each child and the target entry to be retreived and written to disk
> to-from the master just to change it's DN.  Plus we still have the updn and
> ndn indices which also get updated so this is wasteful and causes a lot of
>
>   
> unnecessary access operations.  Also note that we can store a lot more DNs
> in a cached JDBM page then we can entries.  So this will produce more memory
> consumption along with cache turn over.
>   
The memory waste is something we can manage. We are storing two kind of 
data :
- trees
- DN and Entries

If we have to favor one of those two guys, it would be the trees. We can 
cache some data, but at some point, with millions of entries, you won't 
be able to store more than a few of them in memory anyway. Having the DN 
cached of not is just a small part of the problem. (We can consider that 
for a 1k entry - a small one -, the DN is less than 10% of its size)

I don't think we should overlook the extra memory it takes to store the 
DN within the entry.

Anyway, if we don't do that, you immediatly realize that you have to do 
another lookup on the DN index to get the DN for an entry you want to 
return back to the client, an operation which may need disk access, many 
comparison, etc...


> If the modifyDN operation changes the RDN of the target, a master table
> access is unavoidable because the target's RDN attribute in the entry must
> change. However the children of the target can avoid a master table
> read-write operation since their RDN attributes do not change.  This is
> again only avoidable if we do not store the DN in the master.  Ideally you
> just want to update the indices when entries are moved around.
>   
Granted. By I don't think we should optimize the server in order to get 
the ModifyDN operation be the fastest possible. I don't think you will 
have more than a few ModifyDN operations (with child being moved) per 
year on a serious LDAP server instance.
> I've been against this drive to push the DN into the master table combinded
> with the entry from day one along with the drive to remove the NDN and UPDN
> indices.  The obvious reason is due to these issues. 
You are just fixing bug on Modify operation right now, and being focused 
on it, you are losing the whole picture, I think. Step back, let's 
discuss the pros and cons with a global vision, and may be you willl 
realize that it was a good move.
>  I just did not have
> the time to clarify exactly why until I started looking into this bug which
> was recently introduced:
>
>
>  *DIRSERVER-1224 <https://issues.apache.org/jira/browse/DIRSERVER-1224>*
> As I reviewed the code it was clear what this will cost much more on all the
> flavors of ModifyDN operations.  Just imagine a ModifyDN to rename ou=People
> to ou=Users if it contains 100M users in it.  I'd recommend we agree to fix
> this as recommended then I can push a JIRA on it so this can be fixed in the
> future (but before 2.0 since the correction will cause db
> incompatibilities).
>   
Again, this is shortsighted. You are focusing on extreme cases :
- server with 100M entries
- and applying a ModifyDN (a rare operation) which moves 100M entries

Anywya, you have a point here : ModifyDN will be awfully slow, just 
because we will have to rewrite the entire master table. If you think 
about what would have happened with the previous implementation, then 
you would just have to rewrite the entire DN table. I would say it's 
simply 2 or 3 times faster. Big deal ! We are not talking of order of 
magnitude here.

To sum up the advantages of having the DN within the entry, I would say 
that avoiding a lookup for the DN for a search will save a lot of time 
(not order of magnitude), say, 20%, for _every_ search operation. I 
don't think that people search using the DN often, compared to searches 
using another attribute to get the entry.

Let's start a discussion if needed. We can ever add a switch on the 
server to tell the server to store the DN within the entry or not, 
depending on which kind of operation will be the most frequent one : 
searches, or modifyDN with many children moved.

-- 
--
cordialement, regards,
Emmanuel L├ęcharny
www.iktek.com
directory.apache.org



Mime
View raw message