httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dirk.vanGulik" <Dirk.vanGu...@jrc.it>
Subject Re: Suggestion: Access-info for search engines (fwd)
Date Tue, 01 Apr 1997 15:10:54 GMT

> If this interests anyone, talk to him directly.

I can see why one would like this; but it also opens-up a lot
of access to information you do not nessesarily want to be
visible. 

Plus; I'd rather see a requirment like this one to
be leaked down from the esteemed hight of the IETF ( or 
the W3C for that matter) as part of the suggestions for 
HTTP protocol enhancements flying arround to make searching
and indexing easier.

Dw.


 
> ---------- Forwarded message ----------
> Date: Sat, 29 Mar 97 16:20:15 -0600
> From: Fred Lindberg <lindberg@id.wustl.edu>
> To: "apache-bugs@apache.org" <apache-bugs@apache.org>
> Subject: Suggestion: Access-info for search engines
> 
> Hi:
> 
> On the htdig list (htdig@sdsu.edu) we have discussed the following problem:
> 
> 1. Server in domain 'x' serve general docs and docs restricted to domain 'x'.
> 
> 2. Search engine in domain 'x' indexes docs.
> 
> 3. 'Searcher' from outside of domain 'x' gets hits on docs restricted to 'x'
> and quite a bit of info that s/he shouldn't get.
> 
> A cooperative search engine needs to get access control info on the docs it
> managed to retrieve, so that it can give 'searcher' data only on docs that
> 'searcher' has access to. I started to write a perl script that would parse
> Apache config and .htaccess files, then add "META" tags with this info to the
> docs. Very messy, hard to keep in synch, etc. Instead, I started to modify
> mod_access.c to generate headers with this info. I really don't know enough
> about this, or where (other than here) to get comments/info on how to do this
> right and if it is worth doing in the first place. It seems to be a general
> problem of retransmitting info, applicable not only to search engines.
> 
> Enclosed is a diff against Apache_1.2b7 mod_access.c (the comments only
> temporary)
> 
> -
> Sincerely, Fred
> 
> Frederik Lindberg
> Infectious Diseases, 8051
> Washington University School of Med
> 660 S Euclid Ave
> ST. LOUIS, MO 63110
> 
> 
> *** mod_access.c.orig	Sat Mar 29 12:14:01 1997
> --- mod_access.c	Sat Mar 29 16:02:24 1997
> ***************
> *** 50,55 ****
> --- 50,80 ----
>    *
>    */
>   
> + /* Modified 1997-03-29 Fred Lindberg (lindberg@id.wustl.edu)
> +  * (c) for my changes abandoned 1997
> +  *
> +  * Purpose: Local search engines retrieve and index documents, then
> +  * make the results available to searchers. The search results often
> +  * display a substantial part of the info in the document. This is bad
> +  * if the search engine has access to docs that the searcher should not
> +  * be able to see (e.g. local-only docs, local search engine, foreign
> +  * searcher). However, to honour these restrictions, the search engine
> +  * needs to know about access restrictions on documents successfully
> +  * retrieved.
> +  *
> +  * Changes: These changes collect 'order', 'allow', and 'deny' info as
> +  * mod_access sees them for the current document, and adds them as
> +  * 'Access-order:', 'Access-deny:', and 'Access-allow:' headers. This
> +  * is done only to successfully retrieved docs.
> +  *
> +  * Warranties: none. I don't know enough to do this, but wanted to get
> +  * it started. I don't know what the 'Env=' does in find_allowdeny, but
> +  * just copied it. Also, the user-agent restriction?
> +  * Should access headers be done only for 'GET'? How does one
> +  * "register" headers? Should they be 'Pragma:' headers instead? Is
> +  * there a better way?
> +  */ 
> +  
>   /*
>    * Security options etc.
>    * 
> ***************
> *** 224,229 ****
> --- 249,291 ----
>       return 0;
>   }
>   
> + /* -------------------------------------------------------------------
> +    Added 970329, Fred Lindberg (lindberg@id.wustl.edu).
> +    This routine is called like find_allowdeny, but returns a char*
> +    to a pool string of the ' '-seperated access items.
> +    These are collected and sent as a header to give the recipient
> +    info on access control, so that this site can respect it in seatch
> +    result listings etc. Note: This is only sent if the client itself
> +    has access to the doc.
> +    -------------------------------------------------------------------*/
> + char *access_hdr (request_rec *r, array_header *a, int method)
> + {
> +     allowdeny *ap = (allowdeny *)a->elts;
> +     int mmask = (1 << method);
> +     int i;
> +     char *allow = NULL;
> + 
> +     for (i = 0; i < a->nelts; ++i) {
> +         if (!(mmask & ap[i].limited))		/* our method only */
> + 	    continue;
> + 
> + 	if (!strncmp(ap[i].from,"env=",4))	/* What's this? */
> + 	    continue;
> + 	    
> + 	if (!strcmp (ap[i].from, "all")) 	/* all is all */
> + 	    return (char *) pstrdup(r->pool, "all");
> + 
> + 	if(allow) {				/* collect hosts */
> +             allow = (char *) pstrcat(r->pool, allow, " ", ap[i].from, NULL);
> +         } else {
> +             allow = (char *) pstrdup(r->pool, ap[i].from);
> +         }
> +     }
> +     return allow;
> + }
> + /* End modifications */
> + /* -------------------------------------------------------------------*/
> + 
>   int check_dir_access (request_rec *r)
>   {
>       int method = r->method_number;
> ***************
> *** 257,262 ****
> --- 319,347 ----
>       )) {
>   	log_reason ("Client denied by server configuration", r->filename, r);
>       }
> + 
> + /* -------------------------------------------------------------------
> +    Added 970329, Fred Lindberg (lindberg@id.wustl.edu).
> +    -------------------------------------------------------------------*/
> + 
> +     if(ret == OK) {
> +       char *cp = NULL;
> +       char *hdr = NULL;
> + 					/* DENY_THEN_ALLOW is default */
> +       if(a->order[method]==ALLOW_THEN_DENY) { 
> +           table_set (r->headers_out,"Access-order","allow,deny");
> +       } else if(a->order[method]==MUTUAL_FAILURE) {
> +           table_set (r->headers_out, "Access-order","mutual-failure");
> +       }
> + 
> +       if(cp=access_hdr(r,a->denys, method)) 
> +           table_set (r->headers_out,"Access-deny", cp);
> +       if(cp=access_hdr(r,a->allows, method)) 
> +           table_set (r->headers_out,"Access-allow", cp);
> +     }
> + 
> + /* End modifications */
> + /* -------------------------------------------------------------------*/
>   
>       return ret;
>   }
> 
> 
> 
> 
> 
> 


Mime
View raw message