httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Slemko <>
Subject RANT: microsoft trashes itself (fwd)
Date Sat, 03 Oct 1998 05:12:07 GMT
Below is my big Microsoft rant for the quarter.  Some of it is unfair,
some of it is based on stupid advertising people and not the real tech
behind it, but a scary amount is true.  This is only mildly on topic and
is very one sided and not all of my complaints below should be taken to
fully reflect reality or even my opinions.  They do sound good though.

For one of the best articles yet that I have seen on why not to use NT
for web servers:

First, they go on and on about how bad round robin DNS is because you
can't take a server down without people having trouble.  Well duhh.
Wonder why they are still using it for though?

Then they go on and on about how often servers crash, etc.  And the
problem "has been unsolvable".  (hmm... you know... I never have seen the
<load balancing boxes we use at work>, so maybe they don't actually

Then they go on about how it takes three hours to run chkdsk, which is
necessary after a server crash.

They they go on about having "four to six" servers, when
Microsoft's own press releases talk about it needing fourty, etc.

And they have an advanced technology called HttpMon 3.x, which pings each
web server once per minute, something that is more common than system
failures but much harder to detect!  Not only does it ping the machine,
but it does a GET and checks the HTTP status code!  If it gets a 200,
everything is happy.  500 there is a problem.  Hate to think what version
one and two were like.

But they figured out that once their website gets busy, their system took
all the servers out of service 'cause they were choking under the load. So
they decided the solution is to only take one out of three in each cluster
(hmm... maybe they were lying above about 4-6 servers; well, we know they
are lying since it couldn't run on that few) out of service.

Hahahaha.  Ok.  They even explain the reason why they still are using
multiple IP addresses in the DNS: now they have a pool of addresses that
goes to a pool of servers in a M to N mapping.  Great, now when they are
hosed we can't just find one that works and use it but everything will be
randomly hosed.

Then, and I quote:

    Weeks: Prior to the Windows 98 launch, we had some severe content
    problems on the site. A couple of the content tree owners completely
    refreshed their entire sites, and caused some major problems on the

(aka. a couple of bad scripts and the whole site kept falling over!)

Then they go on about how this new technology gave them their first ever
100% availability day!  But even with this, they were below 99.8%
availability on more than one day out of 14.

    Wanke: Until Single IP, we were just like everyone else: we never had a
    100 percent day. Never.

Hahaha.  They must be listening to their own marketing department!  This
sort of thing isn't rocket science.

They continue:

    There was another unexpected side effect. 

    Weeks: With round robin DNS when you rebooted a box, for about two
    minutes while IIS is starting up, the box would be taking up to 100
    hits per second. The performance monitor counters, all of the IIS
    counters on the box would just go ballistic.  We called it 'Earthquake

    The Single IP solution put an end to Earthquake mode. Now, HttpMon
    waits until the server is fully started and returning pages before
    adding it back into the pool of active servers.

Imagine that!  Unexpected side effect.  Well.  If that is unexpected...
well.... I am beside myself.

This also implies that 100 hits/sec is for some reason way above normal
load.  This is on quad ppro boxes people.  Sure, it is nearly all
dynamic content (largely because of poor web site design), but even
what they are doing isn't _that_ hard.


    Disaster Strikes; Few Notice At just before midnight on Wednesday July
    8, a router failed, taking down a significant portion of the
    infrastructure that connects Microsoft's Web servers to its Internet
    customers. The problem persisted for much of the next day.

    Single IP made the day much more bearable for customers visiting the
    Microsoft Web site. Less than 8 percent of the traffic that hit the
    Microsoft Web servers during this time was affected by this massive
    network failure, which under the old system would have affected 12
    percent of hits to the site (see Figure 2). A future planned innovation
    to the Single IP solution called data clustering will shield end users
    from even these types of network failures

8% failure because of a single bad router and it takes a DAY to fix
it?  Good god.

They have a graph of availability at

Seems like 95% availability for their website wasn't too abnormal (3/10
days would be 95% or less with their old setup).  Do you understand that?
That means an hour and a quarter of downtime in a day. 

Their conclusion:

    Weeks: Single IP is a real success story. 

    Wanke: It's over. Gamepoint. We won. 

hahahahahahahahah.  I pity them.

Oh, and is funny too.

Their search servers handle a whopping 2.5 queries/sec each.  They think
ODBC is the best thing in the world for their servers b ecause:

    Another example is the Open Database Connectivity (ODBC), the virtual
    pipeline that transports data between applications. The ODBC lies on
    top of Windows NT and is automatically configured to work with the
    other applications. "You don't have to install it, you don't have to
    worry about it, you don't have to deal with configuring the ODBC at
    all," Mitchell says.

They reveal another great secret technology: connection pooling.  They don't
open a new connection to the database for every hit, but keep one 
connection over multiple hits!  Wow!  Must have taken them years to
think that one up.


    "A poorly coded page on a highly hit site can have adverse effects,"
    Weeks says. "It can hamper the site's availability. That's why it is
    important for us to test our content before it goes live."

One page, the whole thing goes down.

And, to conclude, they say:

    We certainly hope you'll dig into our dog food with gusto! 

I think what they are shovelling comes out of the other end of the dog...

View raw message