On Wed, Aug 20, 2008 at 1:15 AM, Emmanuel Lecharny <elecharny@gmail.com> wrote:
Alex Karasulu wrote:
Hi,

I'm working on the search limits handling code now.  I'm looking into the
interaction between persistent search and these limiting parameters for size
and time. I've made some judgment calls that are common sense based but I
wanted to inform the list in case they may not be the correct decisions.
Here's the current behavior I am coding in the SearchHandler.

<temporary-vocabulary>
non-persistent component :
    The portion of search request processing that returns entries before
    the listening for changes begins when the persistent search control
    is present and it's changes only parameter is set to false.

persistent component :
    The portion of search request processing that returns changes when the
    persistent search control is present.
</temporary-vocabulary>

(1) A normal search request has search limits on time and size enforced.
(2) A persistent search request has search limits on time and size enforced
on the non-persistent component of the request.  The time and size limits do
*NOT* apply to the persistent component of the request.

The reasoning behind this is simple.  Search time and size limits were
created in part to protect the server and in part the client from excessive
processing and return results.  The processing intensive component is the
actual non-persistent part of the persistent search if changesOnly is
false.  The persistent component of the search is intended never to end so
why restrict it by time or even size limits.
 
Well, the size and time limits are optionnal parameters. If one wants to not limit a persistent search, then he simply has to set those values to 0, and it does the trick.

Why do we have to force something which was designed to be configurable by design ? Not that what you say is unreasonable, but, who knows ?

Size and time limits can be set per request as per the protocol yes.  A configurable upper bound on size and time parameters exist in the server.xml to prevent unbounded searches to be issued.  Administrators are not limited by these defaults but normal users are and for good reasons: constraining poor search requests.  Consider a search over 10M entries under some base with a scope and filter that returns all 10M entries.  Not a good idea to allow regular users to just return everything under this base.  Imagine what 1000 of these searches would do to a server if all conducted at the same time.  Plus someone looking up 1 entry while this is occurring will have to deal with a lot of latency thanks to this excessive load.

Granted this is not the best strategy for dealing with QoS but having this simple configurable upper bound for regular users that the system administrator can control is a good idea.

I hope this answered your question? 

One last point.  Once the throttling issue is resolved we can have some QoS strategy thanks now in part to Cursors which store the state of a search and can be tucked away, retrieved and search restarted.

Alex