From: Chuck Murcko <chuck@telebase.com>
Message-Id: <199605310135.VAA02188@telebase.com.>
Subject: Re: An idea for state saving
To: new-httpd@hyperreal.com
Date: Thu, 30 May 1996 21:35:39 -0400 (EDT)
In-Reply-To: <199605302228.SAA01543@volterra.ai.mit.edu> from "Robert S. Thau"
 at May 30, 96 06:28:06 pm
Content-Type: text
Sender: owner-new-httpd@apache.org
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

Robert S. Thau liltingly intones:
> 
>   What's stopping the query output from being written to, say, 10 pages of
>   10 results each, all linked with numbers and Next/Prev. Point the user
>   to the first page and there you go.
> 
> Hey, *any* common resource which can be safely written and accessed by
> all of the web server child processes could be used this way, given a 
> little coding, though the file server probably is the path of least
> resistance.  As to what Alta Vista does, or how, I don't honestly know,
> and I'm a bit curious...
> 
Me too. All I've read is that part of the search performance is due to
brute force - 6 or so Gb of RAM contains the database index. It's brute
force, and expensive, but it works. How they generate their pages for
final delivery is still unknown to me. Their pages are persistent, because
I can set my browser cache to zero and still move back and forth among
the various pages delivered from the query. I'd suspect something like
a very fast, vary large file server or shared disk array.

>   This is not multiple simultaneous updates to the database, is it? Just
>   queries, right?
> 
> Yep... multiple queries to a common database back end can be a problem.
> F'rinstance, let's say that you have multiple server processes talking
> to some kind of back end (database, search engine, whatever) through
> a common pipe.  One of these server processes writes a query to the
> pipe.  Subsequently, it reads a result.  However, this may not be the
> result of the query it made --- it could be the result of another query
> which another child sent simultaneously down the pipe.
> 
> (You can detect these situations by stamping IDs on the queries and
> results, but then it gets really tricky to deliver the misrouted
> response to the child process that asked for it... and all of this
> gets even more fun if the queries are written in pieces, and get
> intermixed in the pipe).
> 
Yes indeed. The classic multiplexer approach. It *is* tricky, but
least expensive from a hardware standpoint, generally speaking.
You put your $$$ into one (or several) extremely fast channels to
the back end database engine.

> One way of dealing with this is by just making sure that each child
> has its own channel to the back end (e.g., opens its own socket, in the
> cases of the ILU requester and FastCGI).
> 
I'd venture to extend this model one layer deeper. One difference from
your description would be that the database engine is two layers back
from the front end httpd machines, and the intermediate layer of machines
actually does the assembly of on-the-fly generated pages from both static
content they'd have locally (not necessarily HTML, but perhaps SGML)
and the pointers returned from the database queries, each of which gets
a channel into the database. Bulk static content is provided to the httpd
machines in front from a disk farm shared with the second tier machines.

It'd be darned expensive, but it would scream.

chuck
Chuck Murcko	N2K Inc.	Wayne PA	chuck@telebase.com
And now, on a lighter note:
Brook's Law:
	Adding manpower to a late software project makes it later