tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Sexton" <>
Subject RE: Multiple data centers and redundency?
Date Wed, 26 Aug 2009 19:39:27 GMT
> -----Original Message-----
> From: Jeffrey Janner []
> Sent: Wednesday, August 26, 2009 12:53 PM
> To: Tomcat Users List
> Subject: RE: Multiple data centers and redundency?
> George -
> This is why I hate statistics.  You can make them say anything.
> Wouldn't the better calculation be based on the average number of
> currently active sessions at one data center, since when it goes down,
> that is the number of users which will be affected.

You're talking about the actual number of people affected.

I'm talking about the probability of any one session being affected.

I mean really, if once every 3 years or so, 500 people have to re-login to
do their transaction is it that big a deal?

> The calculation should also include probability and length of outage

I talked about probability. 1 instance per 3 years was the baseline.

Length of the outage is not a factor. They have a 2nd data center that
things will transparently fail over to, so the length of the outage is

Length of outage would only count if you were putting everything in one data
center and wanted to calculate the risk/loss for an outage. 

> and then weighed against the downside of the end user gettng an error
> message and having to login again and start over.

> I'd really only see it as being a problem if there were extremely long-
> running transactions that would have to be restarted in the event of an
> outage (or a really poorly designed app).

There's a lot of math you can only do if you know the app. How long are
sessions really? If they're 20 minutes long, then I'm overstating it by a
factor of 6.

Are sessions counts evenly distributed through the day? If not, probability
is going to vary based on the time of day of the outage. So, if your session
is 2 hours long, and all your traffic is during the day, then that part
becomes 2/8, not 2/24. Are outages evenly distributed throughout the day? If
so, then there's about a 2/3 chance the outage will come at an off-peak

OTOH, the really big data centers don't have anything like an outage per 3
years. Just doesn't happen.

> Jeff
> p.s. I'm with you on this probably being a minor concern causing a
> larger headache, but we should get the scope of the problem correct to
> begin with. (said by one who both supports and uses webapps that
> support large numbers of users)

Not very many people take the time to understand their actual risk so they
end up over-engineering/over-spending on solutions. The problem with
over-engineered solutions can be they're under-tested and they break when
you need them.

> -----Original Message-----
> From: George Sexton []
> Sent: Wednesday, August 26, 2009 9:52 AM
> To: 'Tomcat Users List'
> Subject: RE: Multiple data centers and redundency?
> > -----Original Message-----
> > From: Andre-John Mas []
> > Sent: Tuesday, August 25, 2009 6:30 PM
> > To: Tomcat Users List
> > Subject: Multiple data centers and redundency?
> >
> > Hi,
> >
> > I have been asked to look into a solution that would involve a few
> > different data centres each with their own set of load balanced
> Tomcat
> > servers. The requirement is for the users not to lose their session
> if
> > one data center goes down. I have never had to work on something this
> > large and have no idea to what extent this can be achieved with
> Tomcat.
> >
> > My initial thoughts would be for each data center to have a session
> > pool, which is synced with each other, so if ever a Tomcat server or
> > data center goes down they can check in the pool to see if it exists
> > and then reuse that. It would mean extra communication behind the
> > scene, but I see no other way go about it.
> >
> > Any help would be appreciated.
> >
> > André-John
> Has anyone really done any math to determine the risk?
> Here an example of what I mean.
> Say you are in a high quality co-location. The probability of an outage
> is
> maybe once in 3 years. That's overstating the probability in my mind,
> but
> we'll use it. Let's also say that you have a high quality clustering
> solution in place in each data center that handles failover of any
> equipment
> WITHIN the data center.
> Say the average length of a user/customer session is 2 hours, and your
> failover system will route any new users to a remaining data center. I
> think
> 2 hours is kind of a long session but we'll use it. Say you have two
> data
> centers.
> So, the probability of an average customer being affected by a data
> center
> outage is:
> 1/( (2 hours)/24(Hours day) * 1/(3*365))/2 (Data centers)
> The probability of an average customer being affected by an outage is
> conservatively 1 in 26280. Expressed as a percentage, the probability
> of any
> individual session being affected is 0.0038%.
> Is your application really so big and critical that you have to address
> this
> very small percentage chance of a session being interrupted?

George Sexton
MH Software, Inc.
Voice: 303 438 9585

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message