Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 42153 invoked from network); 20 Jul 2010 08:53:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Jul 2010 08:53:49 -0000 Received: (qmail 13459 invoked by uid 500); 20 Jul 2010 08:53:48 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 13259 invoked by uid 500); 20 Jul 2010 08:53:46 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 13248 invoked by uid 99); 20 Jul 2010 08:53:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jul 2010 08:53:46 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of akarasulu@gmail.com designates 209.85.161.50 as permitted sender) Received: from [209.85.161.50] (HELO mail-fx0-f50.google.com) (209.85.161.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jul 2010 08:53:40 +0000 Received: by fxm9 with SMTP id 9so3234135fxm.37 for ; Tue, 20 Jul 2010 01:53:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=mQeLOVgGDWVYMZG35EeyWsIOryCEtED0ps3i6+e5Veg=; b=b1ZmWlhblTjrlKO1yGayR9fiCYlnAjL0NTv/f3eEz970jERA5l3zmMsiwz0th4PV5k QTzQFM6qHKENzPR3uKwDz28nuxjHplnAzLl19QVmZswsrjADfNMEsK3Y59tfd0HnYzqP PnSmbBng3+V8DnDnseKQfEGACenjY4t7O1hqc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=s6qflCPKEXU/iKNQMudmozdwGNjW5atv44xb9K0p2sUOPu5IxKTHOsFl1CCHj5lglZ 4C835d0pp1iGfvZmJwu5ux12hV/Bk6hd3A4BMFq6HaSbXRFbaeSwhqeMoFvYQZW8h9dp OgNbL0+3mxxXnknz0ZtYJpmAejszkeW1dDAck= MIME-Version: 1.0 Received: by 10.239.155.18 with SMTP id g18mr431247hbc.184.1279616000019; Tue, 20 Jul 2010 01:53:20 -0700 (PDT) Sender: akarasulu@gmail.com Received: by 10.239.165.198 with HTTP; Tue, 20 Jul 2010 01:53:19 -0700 (PDT) In-Reply-To: <4C455B17.4030709@gmail.com> References: <4C44E117.6040509@gmail.com> <4C45504F.5090507@symas.com> <4C455B17.4030709@gmail.com> Date: Tue, 20 Jul 2010 11:53:19 +0300 X-Google-Sender-Auth: MFxCZbcYLiKJGyb48JLVV1heT_A Message-ID: Subject: Re: Update about subtree problems From: Alex Karasulu To: Apache Directory Developers List , elecharny@apache.org Content-Type: multipart/alternative; boundary=001485f62922214b93048bcdd1c7 X-Virus-Checked: Checked by ClamAV on apache.org --001485f62922214b93048bcdd1c7 Content-Type: text/plain; charset=ISO-8859-1 On Tue, Jul 20, 2010 at 11:15 AM, Emmanuel Lecharny wrote: > Hi Howard, > > On 7/20/10 9:29 AM, Howard Chu wrote:Some side note : > > after having done some perf tests on the evaluator, and applied some >>> improvement, I can tell that depending on the number of subentries an >>> entry is depending on, the cost of this evaluation can goes up to 50% of >>> the search itself cost - not counting the network layer -. For instance, >>> evaluating a subtreeSpecification with a min and a max, no chop, will be >>> done up to 1 000 000 times per second on a 3 level DN (this is all >>> dependent on the DN size) >>> >> >> IMO, the considerations here are the same as for the O(1) rename. I.e., >> when you remove the entryDN from the entry in the DB, you have to calculate >> the DN on the fly, and it certainly is a frequently referenced datum. You >> make this cheap by caching the entryDN in memory, and it's very clear when a >> cached DN must be invalidated - most of the time the cached value will not >> change. >> > The DN cache is most certainly needed for faster operations. Building a DN > on the fly for every entry is one of the most costly operation, so if we can > speed it up with a cache, it's a net gain. Having the DN in the entry OTOH > is not necessary a big gain : you still have to deserialize it if it's not > in cache, and this is also costly. > > Obviously, all those considerations fell in a big dark hole if you have a > decent entry cache, as the entries in memory already store the full DN... > Any modification like a rename or a move will of course invalidate the > entries in this cache. > > All in all, most of the case, you don't have to do all those > computations... > > Regarding the subtree handling, it's different, as you can't spare the > entry evaluation if the entries don't contain the reference to the subentry > they depend upon. This evaluation can be costly, up to a point it's more > expensive than fetching the entry itself. > > The rational being the choice I made 3 years ago (and which was reverted) > to put the DN into the entry was just to speed up any search by avoiding > costly computation at a price of costly unfrequent operations like Move or > Rename (MODDN). > > If you have to move data in a Ldap base, User, then you have to pay the > price ! > > Well yes but even renames cost the same as moves if the DN is in the entry. Someone changing an ou=People to ou=Users containing 100 Million entries should not expect to wait hours before it completes. Plus the atomicity issue is seriously nasty. The DN embedded into the Entry was definitely not the way to go. In fact Kiran and Seelmann's new RDN index to replace the DN index saved us big time making these operations atomic, faster and safer. -- Alex Karasulu My Blog :: http://www.jroller.com/akarasulu/ Apache Directory Server :: http://directory.apache.org Apache MINA :: http://mina.apache.org To set up a meeting with me: http://tungle.me/AlexKarasulu --001485f62922214b93048bcdd1c7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Tue, Jul 20, 2010 at 11:15 AM, Emmanu= el Lecharny <el= echarny@gmail.com> wrote:
=A0Hi Howard,

On 7/20/10 9:29 AM, Howard Chu wrote:Some side note :

after having done some perf tests on the evaluator, and applied some
improvement, I can tell that depending on the number of subentries an
entry is depending on, the cost of this evaluation can goes up to 50% of the search itself cost - not counting the network layer -. For instance, evaluating a subtreeSpecification with a min and a max, no chop, will be done up to 1 000 000 times per second on a 3 level DN (this is all
dependent on the DN size)

IMO, the considerations here are the same as for the O(1) rename. I.e., whe= n you remove the entryDN from the entry in the DB, you have to calculate th= e DN on the fly, and it certainly is a frequently referenced datum. You mak= e this cheap by caching the entryDN in memory, and it's very clear when= a cached DN must be invalidated - most of the time the cached value will n= ot change.
The DN cache is most certainly needed for faster operations. Building a DN = on the fly for every entry is one of the most costly operation, so if we ca= n speed it up with a cache, it's a net gain. Having the DN in the entry= OTOH is not necessary a big gain : you still have to deserialize it if it&= #39;s not in cache, and this is also costly.

Obviously, all those considerations fell in a big dark hole if you have a d= ecent entry cache, as the entries in memory already store the full DN... An= y modification like a rename or a move will of course invalidate the entrie= s in this cache.

All in all, most of the case, you don't have to do all those computatio= ns...

Regarding the subtree handling, it's different, as you can't spare = the entry evaluation if the entries don't contain the reference to the = subentry they depend upon. This evaluation can be costly, up to a point it&= #39;s more expensive than fetching the entry itself.

The rational being the choice I made 3 years ago (and which was reverted) t= o put the DN into the entry was just to speed up any search by avoiding cos= tly computation at a price of costly unfrequent operations like Move or Ren= ame (MODDN).

If you have to move data in a Ldap base, User, then you have to pay the pri= ce !

Well yes but even renames cost the same as moves if the DN is i= n the entry. Someone changing an ou=3DPeople to ou=3DUsers containing 100 M= illion entries should not expect to wait hours before it completes. Plus th= e atomicity issue is seriously nasty. The DN embedded into the Entry was de= finitely not the way to go. In fact Kiran and Seelmann's new RDN index = to replace the DN index saved us big time making these operations atomic, f= aster and safer.

--
Alex Karasulu
My Blog :: http://www.jroller.com/akarasulu/
Apache Directory S= erver :: http://directory.apache.or= g
Apache MINA :: http://mina.apache.org
To set up a meeting with me:
http://tungle.me/AlexKarasulu
--001485f62922214b93048bcdd1c7--