Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 21335 invoked from network); 28 Mar 2009 10:27:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Mar 2009 10:27:06 -0000 Received: (qmail 12090 invoked by uid 500); 28 Mar 2009 10:27:05 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 11998 invoked by uid 500); 28 Mar 2009 10:27:05 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 11990 invoked by uid 99); 28 Mar 2009 10:27:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Mar 2009 10:27:05 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com designates 209.85.217.171 as permitted sender) Received: from [209.85.217.171] (HELO mail-gx0-f171.google.com) (209.85.217.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Mar 2009 10:26:58 +0000 Received: by gxk19 with SMTP id 19so3195057gxk.1 for ; Sat, 28 Mar 2009 03:26:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=nPSGBiRJ82QiDhYrEYdHUODpgnUYrvKfRGv1rxUVnnc=; b=xLG8Lp8H4RzwLHq4Yc3XdhNuMbaeIUszURb6YRdAg5TfcuxhHpjXEI6B/cjO+8eyJa dUdOKfilKHOlXI0Kw0sx7puWajqmUdKU4sfiEhzY9S/lPGK65nK/tY7eil8cVwsTPni5 5BKvBEIGK2R5/Z+Mg8iYXoH1bUMtCbJcfd0cM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=FQZ7ayBuoT9ePEAoVoipWClG9Q5fXOCCTmeteCqpCN1ufm5m+t8Kao/QfIqsz4njtU 823ZEQwEktD62FzJ+oPSGCFlqMZ5AqTwlIk83VKoMGhQXAw+/flEbycpTd/suNsyR3W5 Q9OtzgtCUSwJlCUShiWz9T9buDt3HLCUdQSJk= MIME-Version: 1.0 Received: by 10.231.30.67 with SMTP id t3mr674320ibc.21.1238235996077; Sat, 28 Mar 2009 03:26:36 -0700 (PDT) In-Reply-To: References: Date: Sat, 28 Mar 2009 11:26:36 +0100 Message-ID: Subject: Re: [ApacheDS] A solution for the subentry association performance problem From: Alex Karasulu To: Apache Directory Developers List Content-Type: multipart/alternative; boundary=0003255760b2b1c0f104662b4884 X-Virus-Checked: Checked by ClamAV on apache.org --0003255760b2b1c0f104662b4884 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Sat, Mar 28, 2009 at 7:46 AM, Ersin ER wrote: > Hi, > > The problem you mentioned and many more (collective attrs, dynamic groups) > of that type are related to computed attributes. The solution is to develop > a Virtual Attribute Computation layer on top of partitions. Exactly! I was thinking of some mechanisms to enable this. Namely the idea of some kind of virtual subsystem came to mind which would be passed into partitions for their utilization in search and lookup operations. First virtual attributes must be injected into entries when they are looked up. Second some kind of virtual Index needs to be presented/exposed to allow a search engine implementation to be able to correctly apply a filter on these non-physical attributes. > Of course it's not that easy to implement as it's said but that's something > we strongly need for many other features we want to implement. I had > reported this issue here: > > http://issues.apache.org/jira/browse/DIRSERVER-1067 > > Although I do not have a detailed solution proposal what I want to say is > that it's a more general problem which blocks us implementing other nice > features. In this specific case where a subentry index is used I was thinking it's an implementation detail that I call it an index. You just need to expose mappings and allow partitions to access them. A btree works well in this case to store precomputed values of subtree specification evaluation. In this specific example you are just caching computed attributes, but essentially these are just as virtual and computed as other kinds of virtual attributes we've envisioned. Alex > > > On Sat, Mar 28, 2009 at 03:57, Alex Karasulu wrote: > >> Hi all, >> >> >> Not a dig at anyone for supporting DN presence in entries. >> >> >> Similar Problem We Are Familiar With >> -------------------------------------------------------------- >> >> During this ApacheCon we had discussions around the problem of embedding >> the DN in an entry. Presently the DN is maintained in an entry and in the >> DN index. This makes it very expensive to perform modifyDn operations. The >> classic scenario of this is when a parent entry containing millions of >> descendants has it's DN modified. The result is devastating: >> >> (1) millions of entries are looked up from the master table >> (a) millions of disk blocks are accessed >> (b) millions of entries are deserialized >> (2) the entry cache turns over loosing it's cache hit memory >> (3) the operation can take minutes if not hours >> >> >> The Problem With Subentry Changes >> ------------------------------------------------------------ >> >> Almost the same exact problem is causing the subentry performance issue we >> have been observing. When a subentry is created, all entries selected by >> the subentry's subtree specification are altered to point to that subentry >> for it's administrative purpose. For example, if the subentry is for access >> control then the accessControlSubentries multi-valued attribute has the DN >> of the subentry added to any entry selected by the subentry's >> subtreeSpecification. This attribute is persisted in the entry within the >> master table. When the subentry is removed, renamed, or it's subtree >> specification is altered, all the old entries selected previously are >> recalled from the master table to remove the old reference. Then the new >> reference is added to all entries selected by the new filter. >> >> This process is very expensive as is the modifyDN issue because of the >> massive number of master table read-write operations. As with the modifyDN >> problem a specialized index could minimize the impact of these kinds of >> alterations on large sets of entries. >> >> >> The Solution >> -------------------- >> >> First a special [subentry ID<->entry ID] index will be maintained. This >> maps the ID of a subentry to the ID of the entry the subentry selects via >> it's subtree specification. Note we're not differentiating between the >> different kinds of subentries. When an entry is looked up specifically >> asking for a subentry association attribute like accessControlSubentries >> then we can inject this attribute into the entry before returning it. To >> compute the contents of the accessControlSubentries attribute all we have to >> do is walk this index in the opposite direction for the entry ID. For each >> subentry all we have to do is check the objectClass index to see if it is an >> accessControlSubentry. If it is then the DN of that entry is added to the >> accessControlSubentries attribute. Since subentry reference attributes are >> all operational and normally are not returned unless explicitly asked for, >> we seldom need to add these attributes nor do we need to access this index. >> >> Presently the authorization/subtree subsystem will inject these attributes >> into entries before pumping them into the partition. This means that we can >> maintain this index by finding these subentry association attributes and >> using their information before deleting them. >> >> Issues >> ---------- >> >> I don't like this solution since it puts more responsibility on the >> partition implementation to consistently adhere to these semantics. It also >> presumes every partition will have the ability to create this index. This >> makes me uncomfortable. However this would make these server faster in it's >> ability to respond to subentry changes. All the changes would occur on this >> index instead of on the master table. >> >> Thoughts? >> >> Alex >> >> >> >> > > > -- > Ersin ER > http://www.ersiner.net > --0003255760b2b1c0f104662b4884 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Sat, Mar 28, 2009 at 7:46 AM, Ersin E= R <ersin.er@gmail.com> wrote:
Hi,

The problem you mentioned and many more (collective attrs, dynam= ic groups) of that type are related to computed attributes. The solution is= to develop a Virtual Attribute Computation layer on top of partitions.

Exactly! I was thinking of some mechanisms to enable this.=A0 Name= ly the idea of some kind of virtual subsystem came=A0 to mind which would b= e passed into partitions for their utilization in search and lookup operati= ons.=A0 First virtual attributes must be injected into entries when they ar= e looked up.=A0 Second some kind of virtual Index needs to be presented/exp= osed to allow a search engine implementation to be able to correctly apply = a filter on these non-physical attributes.
=A0
Of cours= e it's not that easy to implement as it's said but that's somet= hing we strongly need for many other features we want to implement. I had r= eported this issue here:

http://issues.apache.org/jira/browse/DIRSERVER-1067

= Although I do not have a detailed solution proposal what I want to say is t= hat it's a more general problem which blocks us implementing other nice= features.

In this specific case where a subentry index is used I was thinkin= g it's an implementation detail that I call it an index.=A0 You just ne= ed to expose mappings and allow partitions to access them.=A0 A btree works= well in this case to store precomputed values of subtree specification eva= luation.=A0 In this specific example you are just caching computed attribut= es, but essentially these are just as virtual and computed as other kinds o= f virtual attributes we've envisioned.

Alex
=A0


On Sat, Mar 28, 2009 at 03:57, Alex Karasulu= <akarasulu@apache.org> wrote:
Hi all,

<note>
Not a dig at anyone for supporting DN presen= ce in entries.
</note>

Similar Problem We Are Familiar With=
--------------------------------------------------------------

During this ApacheCon we had discussions around the problem of embedding th= e DN in an entry.=A0 Presently the DN is maintained in an entry and in the = DN index.=A0 This makes it very expensive to perform modifyDn operations.= =A0 The classic scenario of this is when a parent entry containing millions= of descendants has it's DN modified.=A0 The result is devastating:

(1) millions of entries are looked up from the master table
=A0=A0= =A0=A0 (a) millions of disk blocks are accessed
=A0=A0=A0=A0 (b) million= s of entries are deserialized
(2) the entry cache turns over loosing it&= #39;s cache hit memory
(3) the operation can take minutes if not hours


The Problem With= Subentry Changes
------------------------------------------------------------

Almost the same exact problem is causing the subentry performance issue= we have been observing.=A0 When a subentry is created, all entries selecte= d by the subentry's subtree specification are altered to point to that = subentry for it's administrative purpose.=A0 For example, if the subent= ry is for access control then the accessControlSubentries multi-valued attr= ibute has the DN of the subentry added to any entry selected by the subentr= y's subtreeSpecification.=A0 This attribute is persisted in the entry w= ithin the master table.=A0 When the subentry is removed, renamed, or it'= ;s subtree specification is altered, all the old entries selected previousl= y are recalled from the master table to remove the old reference.=A0 Then t= he new reference is added to all entries selected by the new filter.

This process is very expensive as is the modifyDN issue because of the = massive number of master table read-write operations.=A0 As with the modify= DN problem a specialized index could minimize the impact of these kinds of = alterations on large sets of entries.


The Solution
--------------------

First a special [subent= ry ID<->entry ID] index will be maintained.=A0 This maps the ID of a = subentry to the ID of the entry the subentry selects via it's subtree s= pecification.=A0 Note we're not differentiating between the different k= inds of subentries.=A0 When an entry is looked up specifically asking for a= subentry association attribute like accessControlSubentries then we can in= ject this attribute into the entry before returning it.=A0 To compute the c= ontents of the accessControlSubentries attribute all we have to do is walk = this index in the opposite direction for the entry ID.=A0 For each subentry= all we have to do is check the objectClass index to see if it is an access= ControlSubentry.=A0 If it is then the DN of that entry is added to the acce= ssControlSubentries attribute.=A0 Since subentry reference attributes are a= ll operational and normally are not returned unless explicitly asked for, w= e seldom need to add these attributes nor do we need to access this index.<= br>
Presently the authorization/subtree subsystem will inject these attribu= tes into entries before pumping them into the partition.=A0 This means that= we can maintain this index by finding these subentry association attribute= s and using their information before deleting them.
=A0=A0=A0
Issues
----------

I don't like this solution si= nce it puts more responsibility on the partition implementation to consiste= ntly adhere to these semantics.=A0 It also presumes every partition will ha= ve the ability to create this index. This makes me uncomfortable.=A0 Howeve= r this would make these server faster in it's ability to respond to sub= entry changes.=A0 All the changes would occur on this index instead of on t= he master table.

Thoughts?

Alex






--
Ersin ER
http://www.ersiner.net

--0003255760b2b1c0f104662b4884--