Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 6889 invoked from network); 15 Jul 2010 19:21:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Jul 2010 19:21:29 -0000 Received: (qmail 13297 invoked by uid 500); 15 Jul 2010 19:21:29 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 13255 invoked by uid 500); 15 Jul 2010 19:21:29 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 13248 invoked by uid 99); 15 Jul 2010 19:21:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jul 2010 19:21:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of elecharny@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jul 2010 19:21:20 +0000 Received: by wwi17 with SMTP id 17so484580wwi.1 for ; Thu, 15 Jul 2010 12:20:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=Bhr7fl6ILQV+Sg7KAmXvRn3X/UYM7GSdtmzDCdUAJHM=; b=mjhW/Pr+6ZfaolfF++BR9KMu6BTeGj5QgUxYo/8Wx9srJN6mVgid55MZWCAyYIph2y OCDEqWgv+1UTeZB4SsoeOQMTvsQeyfUMcE9Dd1zE5c5OWDT7IDP0WIpBWdtIDgsfQxre IIA8ZC8++OuNPdCy9taWDJyXb9yT2A7XkM5qE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=TRXC7GWgoScmwAQI+b2piwSe1Nb7fV+iT0kuLd/icAkQ0O442kRJPnU3FFQrp1nvcR XOBkoIBi1CRliVBwnDbNUaSTVIK4DcVmk7CwgP++ojxe46ALzncWupNvytnyMRdLrX43 MydtnaH5R2kk7Yz+eHgNgyAvAtrAns2GNQlFk= Received: by 10.227.72.213 with SMTP id n21mr18271584wbj.186.1279221600311; Thu, 15 Jul 2010 12:20:00 -0700 (PDT) Received: from emmanuel-lecharnys-MacBook-Pro.local ([78.192.106.184]) by mx.google.com with ESMTPS id i25sm8870325wbi.22.2010.07.15.12.19.59 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 15 Jul 2010 12:19:59 -0700 (PDT) Message-ID: <4C3F5FD6.4020801@gmail.com> Date: Thu, 15 Jul 2010 21:21:58 +0200 From: Emmanuel Lecharny Reply-To: elecharny@apache.org User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.4) Gecko/20100608 Thunderbird/3.1 MIME-Version: 1.0 To: Apache Directory Developers List Subject: Re: Subentries handling refactoring References: <4C3F4675.7000506@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org On 7/15/10 9:03 PM, Kiran Ayyagari wrote: > have not touched this part of the server so here are some (naive) questions > > On Thu, Jul 15, 2010 at 11:03 PM, Emmanuel Lecharny wrote: >> Hi guys, >> >> we have serious issues with the way we manage subentries in the server. Not >> that it's not working, but it's certainly not good enough for anything but a >> toy server. >> >> Let me first give some heads up about what's going on. >> >> A subentry is associated with an AdministrativePoint (AP), and defines a >> selection of entries which will be affected depending on the AP role. Those >> roles are : >> - Access Control >> - Collective Attributes >> - SubSchema (not active atm) >> - Triggers (ADS specific). >> >> For instance, if we have a tree with a set of entries associated with a >> location (ie, c=France), we may define a subentry with a Collective >> Attribute role telling the server that every entry under the c=France branch >> will have a specific attribute added. We don't have then to add this >> attribute to *every* entry in this branch... >> >> Anyway... >> >> A subentry defines a selection using a filter, and a base DN for this filter >> to be active from. >> >> Right now, a Subentry is attached to an AP as a (quite) normal entry, and >> when we add this subentry, we modidy *all* the selected entries (using the >> subentry filter and the base DN) will be modified to have a new attribute >> added. > AFAIU we update all the entries with this new info, do we really need > to do this? Of course not. > is it done that way to avoid any search evaluation costs? No, I think it's was done this way because it was the easiest way to make it works, as a preliminary approach. Sometime, you have to balance between good and best. >> This added attribute contains a DN poiting to the associated >> subentry, so that when we process this entry, we can immediately know that >> it's associated with an AP. >> >> So far, so good : processing an entry is fast, as we have all what we need >> when we have the entry. But the dark side is that if we have millions of >> entries, when we add an AP and a subentry, we may have to modify potentially >> *millions* of entries to add this attribute. Not good... >> >> How can we improve this process ? >> >> The idea would be to search for the APs when we process an entry, but it has >> to be fast. How can we do that ? Simple : we use the entry's DN and using a >> DN cache, we can get all the APs associated with an entry knowing its DN. >> It's as costly as the depth of the entry's DN. Once we have grabbed the APs, >> we will have to evaluate the entry to know if it's part of the selections >> defined by the APs' subentry. Done. >> >> Is it costly ? Only marginaly compared to the current algorithm, as we have >> to lookup for the AP, when we have this list in an Attribute in the current >> server. But we spare the big modifications when adding - or >> removing/renaming/moving - a subentry. >> >> What we just need is a APs cache and a way to process it. >> >> This is what I will work on in the next few days if nobody objects or find a >> better algorithm. > how about having a Index on subEntry instead of a cache, one immediate > advantage I see is > that we don't need to cache DNs which will help in minimizing the > modify() operation's time. We just cache the APs' DN, and we won't have many. We will also have an index on subentry, but this is different. When we init the system, we will load all the subentries in the cache, using the index to get them from the backend.-- Regards, Cordialement, Emmanuel L�charny www.iktek.com