directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emmanuel Lecharny" <elecha...@gmail.com>
Subject Re: [LDAP] [Client] Client side Cursors can help w/ LDAP paging and notification
Date Sun, 23 Mar 2008 09:35:06 GMT
Hi Alex,

On Sun, Mar 23, 2008 at 2:30 AM, Alex Karasulu <akarasulu@apache.org> wrote:
> While implementing the Cursor pattern in the JDBM based partition, some
> interesting ideas came to mind regarding the new LDAP Client API we've been
> working on.  Up to now we have simply defined what Entries look like and
> have called it the Entry API instead.  Eventually this will grow into a full
> client as we combine more of the pieces together.  JNDI is a less than
> optimal API for LDAP.

That's for sure ! We have to extend this API to cover all the LDAP
operation (connect, disconnect, send and receive messages, controls,
etc)

>
> Cursors are special, in contrast to the NamingEnumerations we use, because
> they can be positioned after creation.  Cursors can be positioned using the
> beforeFirst(), first(), last() and afterLast() methods at any point during
> their use.  Furthermore, Cursors are bidirectional; they can be traversed in
> both directions with calls to next() and previous() to navigate results.
> Callers can advance a Cursor forward or reverse at any time after creation.

Using cursors into ADS will also allow us to implement the Paging
control (RFC 2696) so easily ! Even defining a new control (and a new
RFC) as we will be able to go back and forth, which is not possible
with the Paging control.

> The Partition interface in ApacheDS will soon expose search results by
> returning a Cursor<ServerEntry> instead of a NamingEnumeration.

It will be Cursor<Entry> (as this is the top level interface)

Depending
> on the filter used, this is a composite Cursor which leverages partition
> indices to position itself without having to buffer results.  This allows
> the server to pull entries satisfying the search scope and filter one at a
> time, and return it to the client without massive latency, or memory
> consumption.  It also means we can process many more concurrent requests as
> well as process single requests faster.  In addition a resultant search
> Cursor can be advanced or position using the methods described above just by
> having nested Cursors based on indicies advance or jump to the appropriate
> Index positions.  We already have some of these footprint benefits with
> NamingEnumerations, however the positioning and advancing capabilities are
> not present with NamingEnumerations.

Further experiments and researches will help a lot here. We may have
problems too, as this will be a concurrent part : some of the data may
be modified while the cursor is being read.

>
> During the course of this work, I questioned whether or not client side
> Cursors would be possible.  Perhaps not under the current protocol without
> some controls or extended operations.  Technical barriers in the protocol
> aside, I started to dream about how this neat capability could impact
> clients.  With it, clients can advance or position themselves over the
> results of a search as they like.  Clients may even be able to freeze a
> search in progress, by having the server tuck away the server side Cursor's
> state, to be revisited later.

The major improvement with Client cursors is that the client won't
have anymore to manage a cache of data. Thinking about the Studio, if
you browse a big tree with thousands of entries, when you want to get
the entries from [200-300] - assuming you show entries by 100 blocks -
you have to send another search request _or_ you have to cache all the
search results in memory. What a waste of time or a waste of memory !
If we provide such a mechanism, the client won't have to bother with
such complexity. Data will be brought to the client pieces by pieces :
if the client want numbe 400 to 500, no need to get the 499 first
entries. If the client already pumped out the first 100 entries, it's
just a simple request on the same cursor, no need to compute it again.

So, yes, client cursors make sense too.

For lack of terms I've likened this to a form
> of asynchronous bidirectional LDAP search. This would eliminate the need to
> bother with paging controls.  It could even be used to eliminate the thread
> per search problem associated with persistent search.  OK, let me stop
> dreaming and start looking at reality so we can determine if this is even a
> possibility.

Reality is just a dream became true :) (sometime, it's a nightmare :)

>
> So these characteristics of a Cursor have a profound impact on the semantics
> of a search operation - not talking about the protocol yet.  I'm referring
> to search as seen from the perspective of client callers using the Cursor:
> the front end.  As stated search operations can be initiated and shelved to
> persist the state of the search by tucking away the Cursor in the connection
> session.  A Cursor for a search will automatically track it's position.
>
> However the protocol imposes some limitations on being able to leverage
> these capabilities across the network on an LDAP client.  A search request
> begins the search, and entry responses are received from the server, until
> the server returns a search response done operation which  signals the end
> of the search operation.  During this sequence, without creative extended
> operations, or controls, there's little the client can do to influence the
> entries returned by the server or throttle the rate of return.  Of course
> size and time limits can be set on the search request but after issuing the
> search, these cannot be altered.  Because the LdapMessage envelop contains a
> messageId, and all responses contain the messageId of the request they
> correspond to, the protocol allows for multiple requests to be issued in
> parallel.  Even if client API's do not allow for it, this is certainly
> possible.

The main point is that each client is associated with a session. It's
then easy to handle a context and use it to store meta data (like a
previously created cursor on some search request, cursor which can be
reused if the underlying data have not been modified).

That bring another matter on the table : if we want to reuse cursors,
we _must_ implement a decent entry cache.
>
> Although I've long forgotten how the paging control works exactly, I still
> have a rough idea: forgive me for my laziness and if I'm missing something.
> A control specifies some number of results to return per page, and the
> server complies by limiting the search to that number then capping off the
> search operation with a search result done.  Cookies in the request and
> response controls are used to track the progress, so another search request
> for the next page returns the next page rather than initiating the search
> from the start.  This breaks a big search up into many smaller search
> requests.

This is true from the client perspective. On the server, there should
be only one search, and the returned results are just waiting for
another search with the same cookie.

This way the client has the ability to intervene in what
> otherwise would be a long flood of results in a single large search
> operation.  If this page control could also convey information about
> positioning, and directionality, along with a page size set to 1, we could
> implement client side Cursors with the same capabilities they posses on the
> server.

Exactly ! For instance, using negative size would result if going
backward. This is a very minor extension to the paged search RFC, and
it can even be implemented using the very same control, simply adding
some semantic to it.

Another extension would be to add a 'position' to start with.

Paging search results effectively has the server tucking away the
> search Cursor state into the client session and pulling it out again to
> continue.  This is how we would implement this control today (that is if
> anyone gets the time to do so :) ).
>
> Persistent search is unrelated however I'd like to explore whether or not
> there's some possible synergy/relationship between it and paging.
> (Persistent search IMHO is poorly implemented but we can deal with it). It
> is intended for receiving change notifications.  A persistent search control
> is issued on a search request which may return a bunch of entry responses
> for the filter if requested, but the most notable thing is that the search
> does not end with a search result done response.  The operation persists to
> return entries satisfying the filter along with a response control
> containing the change type when entries satisfying the filter change
> accordingly.  Clients usually need to assign a thread to listen for such
> responses.  Smart clients will use a single thread instead of one per
> persistent search.  Even smarter clients will use a single thread to listen
> for search responses on persistent search requests and for unsolicited
> notifications.  Regardless once the persistent search request is issued,
> there's no way the client can stop it until size and time limits are
> reached.  These are parameters in the control sent on the search request.
> Of course persistent search requests can use a size limit of 1 and clients
> can request another persistent search after an event is received to have
> more control.  Regardless a change many not occur for a while in which case
> fine tuning with the time limit will help.

The beauty of this cursors approach is that it can be used to
implement persistent search so easily ! In fact, a cursor is just a
persistent search, until you discard it :)


>
> It's a bit crazy to think what would happen if both these controls are used
> together on the same search request.  I guess it all depends on the server
> implementation in the end.  Not sure if anyone would even want to do this.
> Regardless of the pain it would entail, I think this situation can be
> managed to work in the server.  Now where does this lead us tho?  Perhaps
> the Cursor interface could be enhanced to support a listener to
> asynchronously notify users of changes to the underlying results.  The
> Cursor can then be reset (or just repositioned).  Of course this is all
> presuming the Cursor was created to traverse results.  Instead a Cursor
> might just be created to iterate over only changes, but does this make
> sense?  Whatever the answer, at least we can know when underlying results
> have changed to invalidate the Cursor on the client.   This is probably a
> bullsh*t idea but it's entertaining to think about.

Yeah, not sure it worth the price, but at least, we have the tools to build it !

>
> BTW change notifications are probably best implemented as a combination of
> search and extended operations through unsolicited notifications.   The
> client issues a search request with a control similar to the persistent
> search request control.  Instead of 'persisting' the search, the search
> returns immediately with a search result done response using a result code
> to indicate whether or not the server will honor the request to be notified
> of changes.

This is a big semantic shift... Not sure that it will fit with the
current LDAP protocol. However, LDAP V4 does not exist yet ;)

 Then the client is done registering the request to be notified.
> Whenever the server detects changes that satisfy the changeType, scope and
> filter of the notification registration, it sends an unsolicited extended
> response to the client.  The payload carries information about the change
> which took place similar to the way search entry results do with the
> response control of persistent search.   The client can issue a
> deregistration message to the server to stop receiving these notifications
> using an ExtendedRequest.  The server would respond to this with an
> ExtendedResponse.  IMO this is a much better mechanism with full control
> over the subscription and notification process.

That's also some good ideas. As we have to overload the JNDI
Listeners, I think we have to implement such a solution anyway.

Thanks Alex !

-- 
Regards,
Cordialement,
Emmanuel L├ęcharny
www.iktek.com

Mime
View raw message