Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 45647 invoked from network); 7 Aug 2008 07:07:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Aug 2008 07:07:23 -0000 Received: (qmail 55905 invoked by uid 500); 7 Aug 2008 07:07:22 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 55871 invoked by uid 500); 7 Aug 2008 07:07:22 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 55860 invoked by uid 99); 7 Aug 2008 07:07:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2008 00:07:22 -0700 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=SPF_PASS,SUBJECT_FUZZY_TION X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of elecharny@gmail.com designates 209.85.128.188 as permitted sender) Received: from [209.85.128.188] (HELO fk-out-0910.google.com) (209.85.128.188) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2008 07:06:24 +0000 Received: by fk-out-0910.google.com with SMTP id e30so244937fke.9 for ; Thu, 07 Aug 2008 00:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=7+PblMgXAU3jBqABCPMQyTc3t98XXaUih7BZAj8e7Ms=; b=QRKNFpzx/0uoCcmu9XQSSB0vy0ePfwYnTbHIsLkj1ob7KQHyOt1zC12rDe0FUmPJyD 24GR0uuQ55wAC9VCHHGAw7dhP95Ith2NDT+vvoRa9EiACixA9sbciLkOTFW92sjDA3Rs 9o8bKkmy0+DbEpV9J12rTdqyO6ZdlCw6HYp8A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=W8ZtO+NznujsHcPHe9ZtJ820yspBjUVb7m2IKLmgWZy8IDD2uWLPPmb+CeTUQvJG3/ rqHsqCIEGJFxER1PcfQabEwmK/GUJ0UGGf2KLC7LZBou4vT8jqgesrVUZtU0Wu0MH0Fs W8oiJhSqhmUN8xnN3F6gFFeP+BpYYbpIvpg/E= Received: by 10.181.31.17 with SMTP id i17mr1433235bkj.72.1218092793639; Thu, 07 Aug 2008 00:06:33 -0700 (PDT) Received: from ?192.168.0.2? ( [82.66.216.176]) by mx.google.com with ESMTPS id c28sm150656fka.18.2008.08.07.00.06.31 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 07 Aug 2008 00:06:32 -0700 (PDT) Message-ID: <489A9EF6.6080103@gmail.com> Date: Thu, 07 Aug 2008 09:06:30 +0200 From: Emmanuel Lecharny Reply-To: elecharny@nextury.com User-Agent: Thunderbird 2.0.0.16 (X11/20080724) MIME-Version: 1.0 To: Apache Directory Developers List Subject: Re: [ApacheDS] [JDBM Partition] Why it's a BAD idea to store the Entry + DN in the master table References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Alex Karasulu wrote: > Hi all, > Hi Alex, > The ServerEntry stores the DN of the entry. I think this is good for better > code organization. However, storing the entry together with it's DN into > the master table is a very bad idea. The DN should instead be managed in > the NDN and DN indices. > I think you are wrong. Storing the DN within the entyr is a very good idea (tm) :) And the DN should also be managed in the NDN and DN indice. > The reason why I'm suggesting this is because modifyDN operations will be > extremely cumbersome when performed on a DN with many children. ModifyDN operation will be slow. So what ? How many ModifyDN will occurs compare to Search operations ? Storing DN within entires was specifically done in order to speedup the search operation, as it allows us to return a result in 2 accesses to the backend : - an indice access (and it can be the DN indice, this is why we need it), or any other attribute - and an access to the master table When we didn't had this DN stored within th entry, we had one more access, to the DN index, just because we had to get the DN to return back an entry. This was a costly operation, because we had to do log(N) comparison of DN (N = numbers of entry). So, yes, ModifyDN has been slowed down big time, for the benefit of all the searches, something I personally want to pay the price. > It will > require each child and the target entry to be retreived and written to disk > to-from the master just to change it's DN. Plus we still have the updn and > ndn indices which also get updated so this is wasteful and causes a lot of > > > unnecessary access operations. Also note that we can store a lot more DNs > in a cached JDBM page then we can entries. So this will produce more memory > consumption along with cache turn over. > The memory waste is something we can manage. We are storing two kind of data : - trees - DN and Entries If we have to favor one of those two guys, it would be the trees. We can cache some data, but at some point, with millions of entries, you won't be able to store more than a few of them in memory anyway. Having the DN cached of not is just a small part of the problem. (We can consider that for a 1k entry - a small one -, the DN is less than 10% of its size) I don't think we should overlook the extra memory it takes to store the DN within the entry. Anyway, if we don't do that, you immediatly realize that you have to do another lookup on the DN index to get the DN for an entry you want to return back to the client, an operation which may need disk access, many comparison, etc... > If the modifyDN operation changes the RDN of the target, a master table > access is unavoidable because the target's RDN attribute in the entry must > change. However the children of the target can avoid a master table > read-write operation since their RDN attributes do not change. This is > again only avoidable if we do not store the DN in the master. Ideally you > just want to update the indices when entries are moved around. > Granted. By I don't think we should optimize the server in order to get the ModifyDN operation be the fastest possible. I don't think you will have more than a few ModifyDN operations (with child being moved) per year on a serious LDAP server instance. > I've been against this drive to push the DN into the master table combinded > with the entry from day one along with the drive to remove the NDN and UPDN > indices. The obvious reason is due to these issues. You are just fixing bug on Modify operation right now, and being focused on it, you are losing the whole picture, I think. Step back, let's discuss the pros and cons with a global vision, and may be you willl realize that it was a good move. > I just did not have > the time to clarify exactly why until I started looking into this bug which > was recently introduced: > > > *DIRSERVER-1224 * > As I reviewed the code it was clear what this will cost much more on all the > flavors of ModifyDN operations. Just imagine a ModifyDN to rename ou=People > to ou=Users if it contains 100M users in it. I'd recommend we agree to fix > this as recommended then I can push a JIRA on it so this can be fixed in the > future (but before 2.0 since the correction will cause db > incompatibilities). > Again, this is shortsighted. You are focusing on extreme cases : - server with 100M entries - and applying a ModifyDN (a rare operation) which moves 100M entries Anywya, you have a point here : ModifyDN will be awfully slow, just because we will have to rewrite the entire master table. If you think about what would have happened with the previous implementation, then you would just have to rewrite the entire DN table. I would say it's simply 2 or 3 times faster. Big deal ! We are not talking of order of magnitude here. To sum up the advantages of having the DN within the entry, I would say that avoiding a lookup for the DN for a search will save a lot of time (not order of magnitude), say, 20%, for _every_ search operation. I don't think that people search using the DN often, compared to searches using another attribute to get the entry. Let's start a discussion if needed. We can ever add a switch on the server to tell the server to store the DN within the entry or not, depending on which kind of operation will be the most frequent one : searches, or modifyDN with many children moved. -- -- cordialement, regards, Emmanuel L�charny www.iktek.com directory.apache.org