directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: Implementing the PagedSearchControl
Date Fri, 05 Dec 2008 16:12:05 GMT
Alex Karasulu wrote:
> Hi Emmanuel,
>
> On Wed, Dec 3, 2008 at 6:25 PM, Emmanuel Lecharny <elecharny@gmail.com>wrote:
>
>   
>> The problem I have is the following : we have to remember the pointer to
>> the last entry we have sent back to the client
>>
>> How should we do ? My first approach was pretty naive : we are using a
>> cursor, so it's easy, we simply store the cursor into the session, and the
>> next request will just have to get back this cursor from the session, and
>> get the N next elements from this cursor.
>>
>> This has the advantage of being simple, but there are some very important
>> cons :
>> - it's memory consuming, as we may keep those cursor in the session for a
>> very long time
>> - we will have to close all the cursors when the session is closed (for
>> whatever reason)
>> - if some data has been modified since the cursor creation, it may contain
>> invalid data
>> - if the user don't send and abandon search request, those cursors will
>> remain in the session until it's closed (this is very likely to happen)
>>
>> So I'm considering an alternative - though more expensive and less
>> performant - approach :
>> - we build a new cursor for each request,
>> - we move forward the Nth entry in the newly created cursor, and return
>> back the M requested elements
>> - and when done, we discard the cursor.
>>
>>     
>
> I would avoid this approach.  The problem is that it requires almost a
> factorial amount of computation as you scan back to the point you were at
> before to advance the cursor.  Say you have 100 entries and you advance
> reading the first 10.  Then create a new cursor and ask for the next 11-20
> elements.  This means you'll scan through the first 10 elements checking if
> each element is a match for the filter and as you know this shifts a nested
> structure of cursors structured to reflect the logic of the filter.  So
> you're doing a search for 10, then 20, 30, 40, 50, 60 and so on elements.
>   
Yes, I'm aware of that. And I will certainly not go this way ...
>
>   
>> The pros are
>> - we don't have to keep n cursors in memory for ever.
>>     
>
>
> The whole point to this feature is to maintain state so the search continues
> where it left off.  But this should be cheap both for the server and for the
> client. This approach is a brute force approach and it's going to mix up a
> lot of code in complicated places.
>
> It's OK to hold off on this until we see a better approach.  I'd rather wait
> until we feel that eureka light bulb go off.
>
>
>   
>> - from the client POV, it respects the PagedSeach contract
>> - it's easier to implement as we have less information to keep in the
>> session, and to restore back
>>
>> The cons are :
>> - it's time consuming, as if we have N entry to return, with a P page size,
>> we will construct N/P cursors.
>>
>>     
>
> Yes and there will be costs to advances.  Both are going to make this
> approach limiting.
>   
I'm currently going a bit forward into the other direction (ie, storing 
the cursor in the session).

There are vicious issues, though. Some of them are related to the way we 
have designed the server. For instance, when comparing the previous 
searchRequest with the current one, you have to compare attributes, DN 
and filters. That's not complicated, except that those elements might 
not be equals, just because they have not yet been normalized at this 
point (in SearchHandler).

This is a big issue. At this point, we can manage to normalize the DN 
and attributes, but for the filter, this is another story. This make me 
think that the Normalize interceptor is not necessary, and that it 
should be moved up in the stack (in the codec, in fact).

Otherwise, the other problem we have is the Cursor closure. When we are 
done with them, we should close those guys. This is easy if the client 
behave correctly (ie, send a last request with 0 as the number of 
element to return, or if we reach the end of the entries to return), but 
io the client don't do that, we will end with potentially thousands of 
open cursors in memory.

So we need to add a cleanup thread associated with each session, closing 
the cursor after a timeout has occured.

Those are the two problems I'm currently facing...

Otherwise, the implementation itself is pretty straightforward (well, 
not that much, but it's just simple code).

Any idea about how to handle those two problems ?
> Alex
>
>   


-- 
--
cordialement, regards,
Emmanuel L├ęcharny
www.iktek.com
directory.apache.org



Mime
View raw message