Return-Path: Delivered-To: apmail-directory-dev-archive@www.apache.org Received: (qmail 50078 invoked from network); 23 Mar 2008 18:28:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Mar 2008 18:28:42 -0000 Received: (qmail 16451 invoked by uid 500); 23 Mar 2008 18:28:41 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 16241 invoked by uid 500); 23 Mar 2008 18:28:40 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 16230 invoked by uid 99); 23 Mar 2008 18:28:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Mar 2008 11:28:40 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akarasulu@gmail.com designates 209.85.198.185 as permitted sender) Received: from [209.85.198.185] (HELO rv-out-0910.google.com) (209.85.198.185) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Mar 2008 18:28:02 +0000 Received: by rv-out-0910.google.com with SMTP id g11so1212836rvb.25 for ; Sun, 23 Mar 2008 11:28:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=M8+QdtgnKEDBErSqXHEYov5jEKeaI3GWSnDCdpD1hBU=; b=enBk16eu91p6M2kB/tUYfvUWpMexS/uA5V2uLfkqrdMlEBYqpOUVQK1n/NYyr/1MiLqQsRz+AoaodMmq+7qcljy6plXeToxJehLWTFW4O16Scn3nTRTTZr0IZDVkbse02GiFlU7wN4+jKDXyeCrJNmDC+pZf6OeYDthmQWT1NNk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=GRjYRdybM5I7/XscBa0y0IgtTpiCjTAfiV1Aa8MYJKFAjxcD9U3kRek8+UcrHY8twsiTrh60E3ACl3P+MmK9oBDHwieaWm/MsNWMYXntoFnfvlP331ConjQGd8Ij4uDhT1btWbnObbXws3Ft33yxdrjS1a4y5RMVvDQaTcCkYUs= Received: by 10.141.153.17 with SMTP id f17mr1876191rvo.267.1206296893900; Sun, 23 Mar 2008 11:28:13 -0700 (PDT) Received: by 10.141.113.13 with HTTP; Sun, 23 Mar 2008 11:28:13 -0700 (PDT) Message-ID: Date: Sun, 23 Mar 2008 14:28:13 -0400 From: "Alex Karasulu" Sender: akarasulu@gmail.com To: "Apache Directory Developers List" , elecharny@iktek.com Subject: Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_5904_8085937.1206296893895" References: X-Google-Sender-Auth: a754c563d70c5e75 X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_5904_8085937.1206296893895 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Sun, Mar 23, 2008 at 5:35 AM, Emmanuel Lecharny wrote: > Hi Alex, > ... > We have to extend this [client] API to cover all the LDAP > operation (connect, disconnect, send and receive messages, controls, > etc) > I'd love to base our delegated authentication and proxy capabilities on this new API rather than polluting the server with more JNDI code. Also I would rather see this GSoC project on LDAP Object Mapping to use this API instead of JNDI as well. I think I'm done with JNDI forever - can't stand it anymore. ... > > > The Partition interface in ApacheDS will soon expose search results by > > returning a Cursor instead of a NamingEnumeration. > > It will be Cursor (as this is the top level interface) > I think you might have read the line above wrong, the Partition interface is inside the server so the Cursors it would return would traverse ServerEntry object not Entry objects. > > Depending > > on the filter used, this is a composite Cursor which leverages partition > > indices to position itself without having to buffer results. This > allows > > the server to pull entries satisfying the search scope and filter one at > a > > time, and return it to the client without massive latency, or memory > > consumption. It also means we can process many more concurrent requests > as > > well as process single requests faster. In addition a resultant search > > Cursor can be advanced or position using the methods described above > just by > > having nested Cursors based on indicies advance or jump to the > appropriate > > Index positions. We already have some of these footprint benefits with > > NamingEnumerations, however the positioning and advancing capabilities > are > > not present with NamingEnumerations. > > Further experiments and researches will help a lot here. We may have > problems too, as this will be a concurrent part : some of the data may > be modified while the cursor is being read. > You just stepped on a very interesting topic. The topic of isolation. One of the documentation items I put into the definition of Cursors was the fact that they are fully isolated. However we can vary that if we like. They could be setup to traverse results constrained to a specific revision and below in the server. Right now of course theirs no way I can control this so we do get dirty reads. I can however prevent this by leveraging an index on a revision (once we define it) or even on the modifyTimestamp. This can be used to constrain the results returned by the Cursor so dirty reads do not occur. > > > > > During the course of this work, I questioned whether or not client side > > Cursors would be possible. Perhaps not under the current protocol > without > > some controls or extended operations. Technical barriers in the > protocol > > aside, I started to dream about how this neat capability could impact > > clients. With it, clients can advance or position themselves over the > > results of a search as they like. Clients may even be able to freeze a > > search in progress, by having the server tuck away the server side > Cursor's > > state, to be revisited later. > > The major improvement with Client cursors is that the client won't > have anymore to manage a cache of data. Thinking about the Studio, if > you browse a big tree with thousands of entries, when you want to get > the entries from [200-300] - assuming you show entries by 100 blocks - > you have to send another search request _or_ you have to cache all the > search results in memory. What a waste of time or a waste of memory ! > If we provide such a mechanism, the client won't have to bother with > such complexity. Data will be brought to the client pieces by pieces : > if the client want numbe 400 to 500, no need to get the 499 first > entries. If the client already pumped out the first 100 entries, it's > just a simple request on the same cursor, no need to compute it again. > > So, yes, client cursors make sense too. > Yeah I was thinking about using these constructs in Studio to make it more efficient at dealing with very large directories. Thankfully Howard pointed out that we need the VLV control with server side sorting to achieve this rather than just the PSR control. The PSR control I guess really just impacts what would perhaps become some kind of batch size the Cursor would work with. ... > This way the client has the ability to intervene in what > > otherwise would be a long flood of results in a single large search > > operation. If this page control could also convey information about > > positioning, and directionality, along with a page size set to 1, we > could > > implement client side Cursors with the same capabilities they posses on > the > > server. > > Exactly ! For instance, using negative size would result if going > backward. This is a very minor extension to the paged search RFC, and > it can even be implemented using the very same control, simply adding > some semantic to it. > Again so sorry, I confused the VLV and PSR controls. Now that I have it straight I need to look at this draft specification for VLV which looks ancient. Wondering why it never made it to RFC. > > Another extension would be to add a 'position' to start with. > Yes that would be neat I have some ideas on this. First I want to reread the PSR RFC and the VLV draft just to get back up to speed with them. Perhaps we need a new control (as a last resort), but I don't want to do that if we don't have to. Alex ------=_Part_5904_8085937.1206296893895 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Sun, Mar 23, 2008 at 5:35 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:
Hi Alex,

...
 
We have to extend this [client] API to cover all the LDAP
operation (connect, disconnect, send and receive messages, controls,
etc)

I'd love to base our delegated authentication and proxy capabilities on this new API rather than polluting the server with more JNDI code.  Also I would rather see this GSoC project on LDAP Object Mapping to use this API instead of JNDI as well.  I think I'm done with JNDI forever - can't stand it anymore.
 
...
 

> The Partition interface in ApacheDS will soon expose search results by
> returning a Cursor<ServerEntry> instead of a NamingEnumeration.

It will be Cursor<Entry> (as this is the top level interface)

I think you might have read the line above wrong, the Partition interface is inside the server so the Cursors it would return would traverse ServerEntry object not Entry objects.
 

Depending
> on the filter used, this is a composite Cursor which leverages partition
> indices to position itself without having to buffer results.  This allows
> the server to pull entries satisfying the search scope and filter one at a
> time, and return it to the client without massive latency, or memory
> consumption.  It also means we can process many more concurrent requests as
> well as process single requests faster.  In addition a resultant search
> Cursor can be advanced or position using the methods described above just by
> having nested Cursors based on indicies advance or jump to the appropriate
> Index positions.  We already have some of these footprint benefits with
> NamingEnumerations, however the positioning and advancing capabilities are
> not present with NamingEnumerations.

Further experiments and researches will help a lot here. We may have
problems too, as this will be a concurrent part : some of the data may
be modified while the cursor is being read.

You just stepped on a very interesting topic.  The topic of isolation.  One of the documentation items I put into the definition of Cursors was the fact that they are fully isolated.  However we can vary that if we like.  They could be setup to traverse results constrained to a specific revision and below in the server.

Right now of course theirs no way I can control this so we do get dirty reads.  I can however prevent this by leveraging an index on a revision (once we define it) or even on the modifyTimestamp.  This can be used to constrain the results returned by the Cursor so dirty reads do not occur.
 

>
> During the course of this work, I questioned whether or not client side
> Cursors would be possible.  Perhaps not under the current protocol without
> some controls or extended operations.  Technical barriers in the protocol
> aside, I started to dream about how this neat capability could impact
> clients.  With it, clients can advance or position themselves over the
> results of a search as they like.  Clients may even be able to freeze a
> search in progress, by having the server tuck away the server side Cursor's
> state, to be revisited later.

The major improvement with Client cursors is that the client won't
have anymore to manage a cache of data. Thinking about the Studio, if
you browse a big tree with thousands of entries, when you want to get
the entries from [200-300] - assuming you show entries by 100 blocks -
you have to send another search request _or_ you have to cache all the
search results in memory. What a waste of time or a waste of memory !
If we provide such a mechanism, the client won't have to bother with
such complexity. Data will be brought to the client pieces by pieces :
if the client want numbe 400 to 500, no need to get the 499 first
entries. If the client already pumped out the first 100 entries, it's
just a simple request on the same cursor, no need to compute it again.

So, yes, client cursors make sense too.

Yeah I was thinking about using these constructs in Studio to make it more efficient at dealing with very large directories. 

Thankfully Howard pointed out that we need the VLV control with server side sorting to achieve this rather than just the PSR control.  The PSR control I guess really just impacts what would perhaps become some kind of batch size the Cursor would work with.
 
...


This way the client has the ability to intervene in what
> otherwise would be a long flood of results in a single large search
> operation.  If this page control could also convey information about
> positioning, and directionality, along with a page size set to 1, we could
> implement client side Cursors with the same capabilities they posses on the
> server.

Exactly ! For instance, using negative size would result if going
backward. This is a very minor extension to the paged search RFC, and
it can even be implemented using the very same control, simply adding
some semantic to it.

Again so sorry, I confused the VLV and PSR controls.   Now that I have it straight I need to look at this draft specification for VLV which looks ancient.  Wondering why it never made it to RFC.
 

Another extension would be to add a 'position' to start with.

Yes that would be neat I have some ideas on this.  First I want to reread the PSR RFC and the VLV draft just to get back up to speed with them.  Perhaps we need a new control (as a last resort), but I don't want to do that if we don't have to.

Alex

------=_Part_5904_8085937.1206296893895--