Mailing-List: contact dev-help@directory.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Apache Directory Developers List" <dev@directory.apache.org>
Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com
 designates 209.85.198.185 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth;
        b=GRjYRdybM5I7/XscBa0y0IgtTpiCjTAfiV1Aa8MYJKFAjxcD9U3kRek8+UcrHY8twsiTrh60E3ACl3P+MmK9oBDHwieaWm/MsNWMYXntoFnfvlP331ConjQGd8Ij4uDhT1btWbnObbXws3Ft33yxdrjS1a4y5RMVvDQaTcCkYUs=
Message-ID: <a32f6b020803231128u1c8fd46s82d213dadfb6443c@mail.gmail.com>
Date: Sun, 23 Mar 2008 14:28:13 -0400
From: "Alex Karasulu" <akarasulu@apache.org>
Sender: akarasulu@gmail.com
To: "Apache Directory Developers List" <dev@directory.apache.org>,
	elecharny@iktek.com
Subject: Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and
 notification
In-Reply-To: <d45b08f00803230235r5d374ee2t6e5fe3d4b1aec33c@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_5904_8085937.1206296893895"
References: <a32f6b020803221830k43d62afl76b66218e5ef70a4@mail.gmail.com>
	 <d45b08f00803230235r5d374ee2t6e5fe3d4b1aec33c@mail.gmail.com>

------=_Part_5904_8085937.1206296893895
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Sun, Mar 23, 2008 at 5:35 AM, Emmanuel Lecharny <elecharny@gmail.com>
wrote:

> Hi Alex,
>

...


> We have to extend this [client] API to cover all the LDAP
> operation (connect, disconnect, send and receive messages, controls,
> etc)
>

I'd love to base our delegated authentication and proxy capabilities on this
new API rather than polluting the server with more JNDI code.  Also I would
rather see this GSoC project on LDAP Object Mapping to use this API instead
of JNDI as well.  I think I'm done with JNDI forever - can't stand it
anymore.

...


>
> > The Partition interface in ApacheDS will soon expose search results by
> > returning a Cursor<ServerEntry> instead of a NamingEnumeration.
>
> It will be Cursor<Entry> (as this is the top level interface)
>

I think you might have read the line above wrong, the Partition interface is
inside the server so the Cursors it would return would traverse ServerEntry
object not Entry objects.


>
> Depending
> > on the filter used, this is a composite Cursor which leverages partition
> > indices to position itself without having to buffer results.  This
> allows
> > the server to pull entries satisfying the search scope and filter one at
> a
> > time, and return it to the client without massive latency, or memory
> > consumption.  It also means we can process many more concurrent requests
> as
> > well as process single requests faster.  In addition a resultant search
> > Cursor can be advanced or position using the methods described above
> just by
> > having nested Cursors based on indicies advance or jump to the
> appropriate
> > Index positions.  We already have some of these footprint benefits with
> > NamingEnumerations, however the positioning and advancing capabilities
> are
> > not present with NamingEnumerations.
>
> Further experiments and researches will help a lot here. We may have
> problems too, as this will be a concurrent part : some of the data may
> be modified while the cursor is being read.
>

You just stepped on a very interesting topic.  The topic of isolation.  One
of the documentation items I put into the definition of Cursors was the fact
that they are fully isolated.  However we can vary that if we like.  They
could be setup to traverse results constrained to a specific revision and
below in the server.

Right now of course theirs no way I can control this so we do get dirty
reads.  I can however prevent this by leveraging an index on a revision
(once we define it) or even on the modifyTimestamp.  This can be used to
constrain the results returned by the Cursor so dirty reads do not occur.


>
> >
> > During the course of this work, I questioned whether or not client side
> > Cursors would be possible.  Perhaps not under the current protocol
> without
> > some controls or extended operations.  Technical barriers in the
> protocol
> > aside, I started to dream about how this neat capability could impact
> > clients.  With it, clients can advance or position themselves over the
> > results of a search as they like.  Clients may even be able to freeze a
> > search in progress, by having the server tuck away the server side
> Cursor's
> > state, to be revisited later.
>
> The major improvement with Client cursors is that the client won't
> have anymore to manage a cache of data. Thinking about the Studio, if
> you browse a big tree with thousands of entries, when you want to get
> the entries from [200-300] - assuming you show entries by 100 blocks -
> you have to send another search request _or_ you have to cache all the
> search results in memory. What a waste of time or a waste of memory !
> If we provide such a mechanism, the client won't have to bother with
> such complexity. Data will be brought to the client pieces by pieces :
> if the client want numbe 400 to 500, no need to get the 499 first
> entries. If the client already pumped out the first 100 entries, it's
> just a simple request on the same cursor, no need to compute it again.
>
> So, yes, client cursors make sense too.
>

Yeah I was thinking about using these constructs in Studio to make it more
efficient at dealing with very large directories.

Thankfully Howard pointed out that we need the VLV control with server side
sorting to achieve this rather than just the PSR control.  The PSR control I
guess really just impacts what would perhaps become some kind of batch size
the Cursor would work with.

...


> This way the client has the ability to intervene in what
> > otherwise would be a long flood of results in a single large search
> > operation.  If this page control could also convey information about
> > positioning, and directionality, along with a page size set to 1, we
> could
> > implement client side Cursors with the same capabilities they posses on
> the
> > server.
>
> Exactly ! For instance, using negative size would result if going
> backward. This is a very minor extension to the paged search RFC, and
> it can even be implemented using the very same control, simply adding
> some semantic to it.
>

Again so sorry, I confused the VLV and PSR controls.   Now that I have it
straight I need to look at this draft specification for VLV which looks
ancient.  Wondering why it never made it to RFC.


>
> Another extension would be to add a 'position' to start with.
>

Yes that would be neat I have some ideas on this.  First I want to reread
the PSR RFC and the VLV draft just to get back up to speed with them.
Perhaps we need a new control (as a last resort), but I don't want to do
that if we don't have to.

Alex

------=_Part_5904_8085937.1206296893895
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Sun, Mar 23, 2008 at 5:35 AM, Emmanuel Lecharny &lt;<a href="mailto:elecharny@gmail.com">elecharny@gmail.com</a>&gt; wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Alex,<br></blockquote><div><br>...<br>&nbsp;<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">We have to extend this [client] API to cover all the LDAP<br>

operation (connect, disconnect, send and receive messages, controls,<br>
etc)<br>
<div class="Ih2E3d"></div></blockquote><div><br>I&#39;d love to base our delegated authentication and proxy capabilities on this new API rather than polluting the server with more JNDI code.&nbsp; Also I would rather see this GSoC project on LDAP Object Mapping to use this API instead of JNDI as well.&nbsp; I think I&#39;m done with JNDI forever - can&#39;t stand it anymore.<br>
&nbsp;</div><div>...<br>&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>
&gt; The Partition interface in ApacheDS will soon expose search results by<br>
&gt; returning a Cursor&lt;ServerEntry&gt; instead of a NamingEnumeration.<br>
<br>
</div>It will be Cursor&lt;Entry&gt; (as this is the top level interface)<br>
<div class="Ih2E3d"></div></blockquote><div><br>I think you might have read the line above wrong, the Partition interface is inside the server so the Cursors it would return would traverse ServerEntry object not Entry objects. <br>
&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>
Depending<br>
&gt; on the filter used, this is a composite Cursor which leverages partition<br>
&gt; indices to position itself without having to buffer results. &nbsp;This allows<br>
&gt; the server to pull entries satisfying the search scope and filter one at a<br>
&gt; time, and return it to the client without massive latency, or memory<br>
&gt; consumption. &nbsp;It also means we can process many more concurrent requests as<br>
&gt; well as process single requests faster. &nbsp;In addition a resultant search<br>
&gt; Cursor can be advanced or position using the methods described above just by<br>
&gt; having nested Cursors based on indicies advance or jump to the appropriate<br>
&gt; Index positions. &nbsp;We already have some of these footprint benefits with<br>
&gt; NamingEnumerations, however the positioning and advancing capabilities are<br>
&gt; not present with NamingEnumerations.<br>
<br>
</div>Further experiments and researches will help a lot here. We may have<br>
problems too, as this will be a concurrent part : some of the data may<br>
be modified while the cursor is being read.<br>
<div class="Ih2E3d"></div></blockquote><div><br>You just stepped on a very interesting topic.&nbsp; The topic of isolation.&nbsp; One of the documentation items I put into the definition of Cursors was the fact that they are fully isolated.&nbsp; However we can vary that if we like.&nbsp; They could be setup to traverse results constrained to a specific revision and below in the server.<br>
<br>Right now of course theirs no way I can control this so we do get dirty reads.&nbsp; I can however prevent this by leveraging an index on a revision (once we define it) or even on the modifyTimestamp.&nbsp; This can be used to constrain the results returned by the Cursor so dirty reads do not occur.<br>
&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>
&gt;<br>
&gt; During the course of this work, I questioned whether or not client side<br>
&gt; Cursors would be possible. &nbsp;Perhaps not under the current protocol without<br>
&gt; some controls or extended operations. &nbsp;Technical barriers in the protocol<br>
&gt; aside, I started to dream about how this neat capability could impact<br>
&gt; clients. &nbsp;With it, clients can advance or position themselves over the<br>
&gt; results of a search as they like. &nbsp;Clients may even be able to freeze a<br>
&gt; search in progress, by having the server tuck away the server side Cursor&#39;s<br>
&gt; state, to be revisited later.<br>
<br>
</div>The major improvement with Client cursors is that the client won&#39;t<br>
have anymore to manage a cache of data. Thinking about the Studio, if<br>
you browse a big tree with thousands of entries, when you want to get<br>
the entries from [200-300] - assuming you show entries by 100 blocks -<br>
you have to send another search request _or_ you have to cache all the<br>
search results in memory. What a waste of time or a waste of memory !<br>
If we provide such a mechanism, the client won&#39;t have to bother with<br>
such complexity. Data will be brought to the client pieces by pieces :<br>
if the client want numbe 400 to 500, no need to get the 499 first<br>
entries. If the client already pumped out the first 100 entries, it&#39;s<br>
just a simple request on the same cursor, no need to compute it again.<br>
<br>
So, yes, client cursors make sense too.<br>
<div class="Ih2E3d"></div></blockquote><div><br>Yeah I was thinking about using these constructs in Studio to make it more efficient at dealing with very large directories.&nbsp; <br><br>Thankfully Howard pointed out that we need the VLV control with server side sorting to achieve this rather than just the PSR control.&nbsp; The PSR control I guess really just impacts what would perhaps become some kind of batch size the Cursor would work with. <br>
&nbsp;<br>...<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br><div class="Ih2E3d">This way the client has the ability to intervene in what<br>

&gt; otherwise would be a long flood of results in a single large search<br>
&gt; operation. &nbsp;If this page control could also convey information about<br>
&gt; positioning, and directionality, along with a page size set to 1, we could<br>
&gt; implement client side Cursors with the same capabilities they posses on the<br>
&gt; server.<br>
<br>
</div>Exactly ! For instance, using negative size would result if going<br>
backward. This is a very minor extension to the paged search RFC, and<br>
it can even be implemented using the very same control, simply adding<br>
some semantic to it.<br>
</blockquote><div><br>Again so sorry, I confused the VLV and PSR controls.&nbsp;&nbsp; Now that I have it straight I need to look at this draft specification for VLV which looks ancient.&nbsp; Wondering why it never made it to RFC.<br>
&nbsp;</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Another extension would be to add a &#39;position&#39; to start with.<br>
<div class="Ih2E3d"></div></blockquote><div><br>Yes that would be neat I have some ideas on this.&nbsp; First I want to reread the PSR RFC and the VLV draft just to get back up to speed with them.&nbsp; Perhaps we need a new control (as a last resort), but I don&#39;t want to do that if we don&#39;t have to.<br>
<br>Alex<br><br></div></div>

------=_Part_5904_8085937.1206296893895--