httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francis Daly <>
Subject Re: [patch] IndexResults
Date Mon, 27 Jan 2003 09:37:23 GMT
On Sat, Jan 25, 2003 at 02:06:26PM -0800, Justin Erenkrantz wrote:
> --On Saturday, January 25, 2003 9:23 PM +0000 Francis Daly 
> <> wrote:

Hi there,

thanks for your input.

> >+/* res_info is used when merging res_list */
> >+typedef struct res_info {
> >+    int *range;
> >+    apr_table_t *tab;
> >+} res_info;
> I think this is the wrong data structure to be using here.  A table 
> doesn't make a whole lot of sense.  It's primarily used to deal with 
> strings not integers.  The 'native' status and true/false conditions 
> are best done with an integers for keys.  

I agree that integers would be much better to use.  With my first few
tries at an implementation based on them, though, I couldn't come up
with a sensible way of processing all IndexResults directives.  Possibly
I was trying to be too clever, allowing too much flexibility in the

I'll try to show what I mean using your data structure below -- although
the answer may be "don't allow such freedom in configuring the thing".

But now that I actually write it down, it looks like it may not be much
of a problem after all -- a better data structure makes a big difference
(as always).

> This patch does way too many unsafe things with strings for my liking.

"Unwise" I'll accept, but I thought I had been careful to avoid
"unsafe".  No matter: if it can work with integers, that's the way to go.

> You can key off the first digit (easy to get) and only have it 
> restricted to that region of the number space (create buckets).  For 
> series that cross sequences, you can split them up at 100-intervals. 

At least for the initial config intention, there shouldn't be any of
those.  Either "###" meaning "just that one", or "#xx" meaning "range
from #00 to #99".

> (I'd be content to lose one bucket so that we don't have to subtract 
> the status on the request.)  Allows and denies are separated into two 
> lists making it easy to keep the ordering (where allows are always 
> before denies).

The problem I ran into when trying to implement something along those
lines, was how to manage individual deny rules.  That's how I ended up
with separate "range" and "individual" lists.

Using your outline, merging a deny (or group of denies) will potentially
involve quite a bit of fiddling with the allow list.

Contrived example: one directory has IndexResults 4xx (or whatever the
syntax for "all 400-series" is).  A subdirectory has IndexResults -401
-406.  A subsubdirectory has IndexResults 405 406 407.

Creating the per-directory configs is easy -- the first is a single big
range, the second is two 1-element ranges, and the third is either three
1-element ranges, or possibly one 3-element range.  Merging them is where
I couldn't find a straightforward algorithm (although, with the right
data structure, it appears much easier).

The top directory here has "allow" of 400-499, and "deny" of nothing.
The middle directory has to have "deny" of 401,406, and so will
presumably have to switch "allow" to 400,402-405,407-499.  The bottom
directory will add 405,406,407 to allow (it won't have to touch deny, as
the allow list is tested first).  It may be able to be clever and notice
that it can change "allow" to 400,402-499, but that's not something
needed right now.

Basically, if "allow" is always read first, then the merging of "deny"
must remove things from "allow", adjusting the entire chain as it goes.

Now, I expect that the common case will be IndexResults 401 set
globally, (or not at all) and not modified in directories, so the
"allow" chain will be 300-399,401-401 (or 300-399) everywhere.  Given
that, the performance of a merge compared to a search may not be an
issue at all.

> What I would envision would be something like this:
> /* res_info is used when merging res_list */
> typedef struct range_info {
>   int start;
>   int end;    /* start == end indicates a singleton. */
>   struct range_info *next;
> } range_info;
> typedef struct range_list {
>  range_info allow[10];
>  range_info deny[10];
> } range_list;

Looks good.

The merge algorithm then becomes: for any IndexResults Nxx, we remove
all elements in allow[N], and then add allow[N] = {N00, N99, NULL}.

For any IndexResults -Nxx, we remove all elements in allow[N], and
remove all elements in deny[N], and then add deny[N] = {N00, N99, NULL}.
Actually, with that, do we even need a deny[]?  "If it's not in allow[],
deny it" might be more straightforward -- especially if allow[] will
always be checked first anyway.

Assuming there is no deny[], then for any IndexResults N##, check if
it's in the allow[N] chain, and if not, add it (by incrementing .end
or decrementing .start of the appropriate element, or adding a new
singleton element, and adjusting .next pointers accordingly).  Or, just
add it to the end of the chain, and leave any future [-]Nxx directive to
remove it when the time comes.

And for any IndexResults -N##, check each element of the allow[N] chain.
If the N## case is "add to the end of the chain", then for each element
of allow[N], split any ranges that include N## into two (by creating a
new link, and adjusting a .next pointer), and remove any singletons N##,
also adjusting a .next pointer in the previous link.  Alternatively, if
the N## case was the more-work "keep the chain ordered with no overlap",
then the -N## case becomes "find the link that should include it, and
split or remove that one if it's there".

It may come down to a guess as to whether N## or -N## is more likely,
but I suspect that keeping the chain ordered without overlapping entries
will be tidier.  Especially if pool-magic means that we can just create
new links without having to worry about correctly de-allocating older
ones which are no longer in the chain.

(Of course, because the chain involves following pointers, any "merge"
operation will have to generate an entirely new chain, at least for
each affected allow[N].  Shouldn't be too much trouble, especially as it
shouldn't be the "usual" code path.)

> But, we ought to keep the comparisons as integers.  This would be 
> reasonably efficient to compare (tables are awful) and is directly 
> proportional to the number of IndexResults directives you have.  I 
> think it's a fair tradeoff with a much cleaner implementation.

I agree.  I used strings because, basically, I couldn't come up with a
better data structure.  I guess I should have tried harder, or asked

> The comparison would simply look like (pseudocode):
> current = ranges[r->status % 100];
> while (current) {
>  if (r->status >= current.start && r->status <= current.end) {
>    return 1;
>  }
>  current =;
> }
> That's way more efficient than doing a sprintf.  Walk it twice for 
> denials and approvals.
> What do you think?  -- justin

I like, although I think a single ordered allow list should be
sufficient, both in terms of "possible configs" and "probable configs".
(Hopefully I've properly understood your suggestion.)

Can you see any glaring holes in the implementation plan outlined above?

If not, I'll start writing.


Francis Daly

View raw message