directory-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: Getting Paged Result
Date Wed, 12 Nov 2014 07:54:59 GMT
Le 12/11/14 06:54, Neha Rawat a écrit :
> Thanks.
> From what I understood, I can only specify pageSize but not offset.
> What if I only want back results starting from 51th entry to 60th entry?
You can't.
> Do i have to first fetch results from 1st-49th entry and then from 51st to
> 60th.
yes.

You have to understand how it works internally : when you do a search on
a LDAP server, the result is not built before being returned. That would
be too costly. A set of candidates is created, which is then filtered
accordingly to the filter you have specified. In a way, there is no
possibility to know how many entries you'll get as a result of a search.

PagedSearch is a bit different, but not so much : it fetches N entries
from the candidate list, and returns this set. Optionnally, it returns
an estimation of the total number of entries that will get returned
("the size MAY be set to the server's estimate of the total number of
entries in the entire result set" from
https://www.ietf.org/rfc/rfc2696.txt).

What is important here is "MAY" : it's extremelly costly to compute this
number accuratly on the server beforehand (don't get me wrong : when you
do a search, we do process all the entries teh exact same way, but at
least, we don't store the set of result in memory).

At this point, to provide this size, we have two options :
- either we process the full search to count the number of elements we
will return, and don't keep any of the resulting entries, then we redo
the search and return the entries the normal way (ie, one after the
other) (that is CPU intensive but keep the memory low)
- or we process the full search and stores the result in memory before
sending it (and this time, we save CPU but uses a lot of memory)

Both options are very demanding on either CPU (we do the search twice)
or memory (we do the search on the server and gather all the results).

Btw, if you think about what most people do when implementing a web page
where a list of results is presented, with arrows to go back and forth,
with the data in a RDBMS, they usually do a SELECT COUNT(*) first, to
know the size of teh result set, and then do the so called "real
search". In fact, there is nothing like a "real search" : SELECT
COUNT(*) *DOES* process the exact same search, so at the end of the day,
you are doing teh same search twice. Very inneficient, and makes th DBA
yelling at you...

Last not least, if you want to jump at a given position in the result
set, there is no option : you will have to keep a set of result in
memory on the server. Memory is a limited resource, and when you reach
the end of it, very bad things happen to the server, while when you
reach the 100% CPU, the server just becomes unresponsive for a while. We
would always favor the second option, if faced to such a choice...

Ah, btw, there is nothing like a back forward in the paged search request...

Hope it helps...


Mime
View raw message