www-infrastructure-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Querna <p...@querna.org>
Subject long term goal: reliable services for developers
Date Sun, 23 Aug 2009 02:55:04 GMT
(CC'ed to infra-private to get eyes, please discuss on infra-dev)

The ASF Infra Team had a goal over the years ago to remove as many
single points of failure on public facing services as possible -- and
today, you see the results.  The Websites, Version Control, E-Mail,
traditionally our 'core' services are all redundant to multiple data
centers in the United States and Europe.  It was not a quick process,
it was not painless, but today we sit pretty happily for most public
facing services.

The ASF of 5 years ago, when having public facing services redundant
was enough, is not the ASF of today.

We are pushing 2300+ committers to with tens of thousands of accounts
on the Wikis and Issue trackers.  It is feasible to see the ASF hit
100+ Top level projects before long -- so many that there is always
someone doing a release, development work is always going on.

I believe our next goal should be to make all Developer facing
services more redundant and reliable. This includes both more public
facing ones like issue trackers, to private ones like shell accounts.

Recently the disk array on minotaur aka people.apache.org has been
having problems.  We can go into other threads about minotaurs ZFS
issues, but it has exposed how we have abused people.apache.org as a
shortcut to provide many services.

When minotaur.apache.org is having problems, the following services
are disrupted:
 1) people.apache.org website
 2) people.apache.org/~userid websites
 3) planet.apache.org website (ease of access)
 4) Maven Repositories (ease of access)
 5) E-Mail Forwarding of userid@apache.org
 6) Mirror Network Seed  (ease of access)
 7) TLP & www.apache.org Website Seeds (ease of access)
 8) DNS Hidden Master

When brutus.apache.org is having problems:
 9) issues.apache.org Website & Databases
   9a) JIRA
   9b) Bugzilla
 10) cwiki.apache.org (Confluence)

When eos.apache.org is having problems:
 11) wiki.apache.org (MoinMoin)  [semi-mirrored to eu, but not trivial]

A common theme for many of the services hosted on minotaur is using
them as seeds, due to every committer having an account there.  With
LDAP backends coming online, we can start to use other methods of
authentication for committers.   The biggest bang for our buck would
be to figure out how to distribute files in a distributed way, without
needing a centralized host. (Website Seeds, Mirror Network Seeds,
Maven Repositories).

What can you do to help?

Pick one service, figure out a plan to make it reliable, preferably
hosted in both EU and the US., recruit volunteers to help you out, and
make it happen.

We have a budget, we will spend it on hardware as needed.  This won't
all be done in 1 year, but fixing one services at a time will put us
into a better place eventually.




View raw message