httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bruce" <bedoug...@earthlink.net>
Subject RE: [users@httpd] Serving remote files from Apache...
Date Thu, 01 Jan 2004 19:42:29 GMT
Jonas/Nick,

Thanks for the response...

Our project would have 100s of modified Apache client apps, each of which
would have to communicate with the Master/Central app to get the required
pages/data to server up. In keeping this approach simple, we're going to
consider static content only. There will be no processing of scripts/etc on
the client Apache PCs...


That said, the following are responses to the issues you raised...

1.1: How often does the configuration change?
     The configuration for the Apache clients should be able to be updated
in a dynamic fashion. It is critical that the modified Apache clients be
able to access this information from the Central system. This gives us
control over how the client app is configured... Another way might be to
have the config information periodically downloaded to the client machine,
and then have the Apache client "load" it. This would require the config
information be in a different format than the straight text file... There
would also be security considerations...

	For ease of use.. it seems the "easiest" approach would be to "suck" the
config information into the Apache client...


1.2: How often must the Apaches update their configuration?
	See above...

1.3: How critical is it that *all* the Apaches *always* contain the latest
configuration data?
	Each Apache client may in fact contain a different "config". But it's
important that they be updated with the latest as soon as possible. This of
course will require some central app/process that essentially monitors when
"config" information is changed, as well as the configs of each Apache
client, and then which Apache clients get what configs....


1.4: Will all the Apache installations be identical in *all* ways?
     See 1.3...


1.4.2: Will the OS installations for all the Apache's be identical?
     We are considering Linux/Windows variant OSes for the Apache client
apps

2.1: How much data are we talking about?
     You had to ask...!!! For a given site, we believe that they will
average ~10-20 pages of text... We'd guess ~5-10K.. We don't really know,
but we do know we're not talking large quantities of data for the sites...


2.2: Is this dynamic data or static data? (The usefulness of mod_cache
depends on this).
     The data in the sites/pages will be static... However, we do not (at
least initially) want to cache any page/site content on the users PC anymore
than we have to.. There are a couple of reasons for this, primarily due to
the fact that we don't want a user looking at their hard drive and seeing
20-30 Meg of data due to us!!

     We have considered using temp files/directories as needed...

     We are also open to using/considering the use of caching methods,
provided we also incorporate some form of encryption for the content. This
would ensure that the underlying site page/content data is "safe" and has
not been altered...

     This could also probably be accomplished with some computed hash.. that
is then checked against the data prior to it being served by the client
Apache, for site data that is cached...


2.3: Will there be scripts r other dynamic stuff that the Apaches are
supposed to fetch and then execute themselves?
     Nope.. For now, we are only going to deal with static content...


2.4: What kind of hardware (CPU numbers and speed, disk size, amount of RAM)
will the Apache's be installed on?
     This will range. For design issues, assume a minimal machine... 500MHz,
10GHd, 256Meg Ram.


2.5: What kind of connections will the Apache's have to the remote machines?
     The modified Apache clients will have to access the Master/Central
app/system through a secure process. Any data/communication between the
Central system and the client Apache apps will have to go through a
verification/validation process...


2.6: What OS will the Apache's machines and the remotes run? (This can be
important when considering mounting directories remotely.)
     see above...

     However, given the number of clients that we will be shooting for, if
we develop this!!, we would not like to mount directories internal to the
Master app/system environment that would be open to the outside clients...

     Our initial concern regarding bottlenecks had to really do with an
understanding of the issues surrounding the mod_proxy process. We're still
not comfortable with this approach given that it would require the Master
App/System to be running an Apache Based app, that will be required to have
its own config file.

     This config file would have to updated quite frequently to accommodate
changes made to the client Apache config information, for the mod_proxy
sections...

     These changes appear to be required, as we may not want the same mod
Apache client to serve the same sites on a continual basis...


This approach seems to provide a possible working approach. Assuming that
the Master Server/System apps can be created, as well as the monitoring app,
and the requisite software apps. The real issue would seem to be the ability
for the Master System to handle the quantities of requests for information
that will be coming from the client Apache apps. This could be accomplished
by a couple of cabinets of rack servers with the appropriate round robin
apps to facilitate this process...

If you're interested in having a further discussion, or in perhaps joining
what we're thinking about doing, we would be interested in talking with you.

At present this is an idea that's being refined.

Thanks for your assistance...

Regards,

Bruce Douglas
bedouglas@earthlink.net



-----Original Message-----
From: Jonas Eckerman [mailto:jonas_lists@frukt.org]
Sent: Thursday, January 01, 2004 8:50 AM
To: users@httpd.apache.org
Subject: Re: [users@httpd] Serving remote files from Apache...


On Wed, 31 Dec 2003 12:04:52 -0800, bruce wrote:

> We've been investigating/searching for an answer to a problem... We've
seen
> possible solutions, but nothing that's been exact!..

That's the way things usually work out. When searching for a solution you
need to decide wich of the following is the most important:

1: To get the perfect solution that fits your preconceptions of the solution
exactly. This often means you'll have to write the necessary code yourself.

2: To get a working solution that does solve the *original* problem, even if
not the way you envisioned when you set out and even if it means
reconcidering some things. This can often be done using existing solutions
in creative ways, and is often the approach that works best.

Note my use of *original* problem above. It's very common that people first
have a rather abstract problem. Then they start thinking of solutions. When
they think they're on the right track they start investigating how to
implement the solutions using existing means. If you stumble at this stage,
it's often a very good idea to step back to the first stage and see if there
are alternative ways to solve the original (often rather abstract) problem.

It's also very useful to explain the original problem when asking for help.
When people know the original problem, they may sometimes come up with
suprising solutions that one would never have thought about oneself.

Currently, you have explained a solution to a problem and have asked for
help and tips about implementing that solution. But you have not explain
what problem your solution is meant to solve.

I think you also need to separate your two different problems. You are
actually looking for solutions to two completely separate problems. The
solutions may therefore be completely separate as well. The problems:

1: Get the Apache machines to fetch their configuration from a central
place.

2: Get the Apache machines to serve content stored on remote machines.

To me both problems seem rather easy to deal with using existing solutions,
but as I don't have all the info on exactly what you're trying to do they
might indeed be difficult problems. Some info that'd be very good to have
about both problems:

1.1: How often does the configuration change?
1.2: How often must the Apaches update their configuration?
1.3: How critical is it that *all* the Apaches *allways* contain the latest
configuration data?
1.4: Will all the Apache installations be identical in *all* ways?
1.4.2: Will the OS isntallations for all the Apache's be identical?

2.1: How much data are we talking about?
2.2: Is this dynamic data or static data? (The usefullness of mod_cache
depends on this).
2.3: Will there be scripts r other dynamic stuff that the Apaches are
supposed to fetch and then execute themselves?
2.4: What kind of hardware (CPU numbers and speed, disk size, amount of RAM)
will the Apache's be installed on?
2.5: What kind of connections will the Apache's have to the remote machines?
2.6: What OS will the Apache's machines and the remotes run? (This can be
important when considereing mounting directories remotely.)

>  However, we'd actually like to have the website/page files reside
>  on the remote PC/Harddrives and to basically have them read from
>  the remote machine, and served via the Apache app...

Wich you can do with Apache's mod_proxy.

Honestly, I don't understand why you don't want to use mod_proxy for this.
You need to use some kind of transport protocol to have Apache fetch the
files from a remote machine. You've allready stated that you do not wish to
use NFS. Why not use HTTP? Do you have any other particular protocol for
fetshing the files that you'd prefer to HTTP? Are the Apache's supposed to
fetch executable code (PHP pages, CGIs, etc) and execute it (mod_proxy won't
do this)?

>         Apache Server (http://12.13.14.15) ip address
>              |  ^
>              |  |
>              |  |
>              V  |
>         Remote PC/Server

> If this is not easily doable, and we need to utilize the ProxyPassReverse
> solution, what issues are involved?

This can of course be done, but you will need some way for Apache to fetch
the files from the remote server. With mod_proxy, Apache allrady contains
the functionality that you describe (as I interpret it). If, for some as yet
unexplained reson, you do not wish to use mod_proxy you can use some other
method. You could mount (not necessarily through NFS) the remote machines
directories on the Apache machine, or you can implement your own module to
get Apache to fetch them.

If you can accept the use of mod_proxy (wich does exactly what you want) but
don't want Apache to fetch the files with HTTP, you can use mod_proxy_ftp
instead and have Apache fetch the files with FTP, or you can implement your
own module to get Apache's mod_proxy to use some other protocol.

One thing you have not explained is why you want Apache to do the fetching
of both content and configurations. Why not let the OS do this?

> In particular, how scalable would the
> ProxyPassReverse approach be? We might need to server potentially 1000's
of
> sites with this approach...

1000's of different web sites (meaning Apache will fetch from 1000's of
different remote machines)? Or maybe 1000's of different Apache's that will
all fetch from just a few remote machines? Or 1000's of something else?

* If you'll have 1000's of Apache instalations all fetching from just a few
remote machines:

You've allready created scaling problems as those few remote machines can
get extremely heavily loaded.

If this is the case, you should use mod_cache in the 1000's of Apache
installations in order to lower the load on the central remote machines.
With mod_cache this scheme should be able to scale very well.

One problem, even with using mod_proxy in this case is that mod_proxy just
fetches data and sends it on to clients. If you're using PHP, ASP, CGI or
similar stuff, the remote machine is the one that'll have to execute this
code, because Apache with mod_proxy just fetches it ans serves it as-is. If
this is what you're doing (or might be doing), mod_proxy will create
problems rather than solve them. This means you will probably be better of
letting the OS handle the actuall fetching of the remote files, and use a
more standard Apache that doesn't really know wether the files are stored
locally or remotely.

OTH, you did say you wanted small stripped Apaches, so I guess you do not
want them to be able to execute CGIs, PHP, ASP or other similar stuff
anyway.

* If you'll have 1000's of different virtual hosts in each Apache or 1000's
of remote machines:

I'm not sure where exactly the scaling problems will be, except that you
really should look at the modules for mass virtual hosting if you do this.

Regards
/Jonas

--
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message