maven-repo-maintainers mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Contegix Notifications <notificati...@contegix.com>
Subject Contegix Network Incident Report
Date Mon, 06 Jul 2009 07:46:17 GMT
Contegix Customer:

Please do not reply to this email.  If you have any questions, please submit a support request
to support@contegix.com.

At approximately 11:39 AM on July 2nd, our NOC engineers began to receive several monitor
alarms alerting us of a potential network issue. 
We found our core switches were dropping packets to both internal and external traffic.

We began to investigate and found abnormal traffic lights on one of our intrusion prevention
systems.  At that time, we believed this to be 
the cause and physically bypassed the units. We quickly determined that this was not the root
cause and the problem still persisted. We then 
began to troubleshoot in our core switching.

At approximately 11:59 AM, we determined there was a multicast packet storm on our network.
Due to the high number of packets, the CPUs in 
both core switches reached max capacity which caused packet loss. After further debugging
we found that the storm was from a routing 
protocol (VRRP-E) multicast IP and originating from a specific customer core switch port.
The customer connected to this port had had a 
switch malfunction a few minutes prior to the network issue and we determined this could be
the cause. At approximately 12:05 PM, we 
disabled the customer port and the CPUs on our core switches began to stabilize.

Network availability to internal and external destinations were restored, but we found that
we still could not reach a few external 
destinations. Also, traffic was increasing on our network but not at normal utilization. After
further troubleshooting, we found that we 
could not route out Level(3)’s network. Based on our observations and data, we could not
determine the reason for the Level(3) issues. At 
approximately 12:19 PM, we disabled BGP with Level(3). Once this was disabled, our network
returned to normal and traffic flowed through to 
outbound routes correctly.

While the issue started when a customer replaced a switch, we do not believe this is the direct
cause. We do suspect that it triggered a bug 
in our core switch software despite all engineered precautions.  We are working closely with
the hardware manufacturer to determine the 
exact cause. We will forward any new information on this issue and long-term resolution. In
the interim, we have placed a moratorium on 
adding new customer switching equipment connected to our core switches.  In addition, we restored
our BGP session with Level(3) once it was 
determined to be safe.

We apologize for any inconvenience this may have created for you or your customers. Our reliable
network is one of our great assets, and we 
place a great deal of emphasis on making sure it is working optimally. As mentioned before,
we are working closely with the switch 
manufacturer to identify and fix this bug to make sure this does not occur again.


Sincerely,
Contegix Support

---
Contegix
900 Walnut Street
Suite 700
Saint Louis, MO  63102
Phone: 314.622.6200 ext. 3
Toll Free: 877.4.CONTEGIX ext. 3
Fax: 314.621.4422
E-mail: support@contegix.com
Beyond Managed Hosting(r) for Your Enterprise
Favorite Linux-Friendly Hosting Company - Linux Journal
http://www.contegix.com/linuxjournal


Mime
View raw message