Return-Path: Delivered-To: apmail-maven-repo-maintainers-archive@minotaur.apache.org Received: (qmail 27327 invoked from network); 6 Jul 2009 07:47:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jul 2009 07:47:18 -0000 Received: (qmail 46944 invoked by uid 500); 6 Jul 2009 07:47:28 -0000 Delivered-To: apmail-maven-repo-maintainers-archive@maven.apache.org Received: (qmail 46884 invoked by uid 500); 6 Jul 2009 07:47:28 -0000 Mailing-List: contact repo-maintainers-help@maven.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: repo-maintainers@maven.apache.org Delivered-To: mailing list repo-maintainers@maven.apache.org Received: (qmail 46874 invoked by uid 99); 6 Jul 2009 07:47:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 07:47:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [63.246.22.115] (HELO qmail01.contegix.com) (63.246.22.115) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 07:47:16 +0000 Received: (qmail 29916 invoked by uid 89); 6 Jul 2009 07:46:50 -0000 Received: from unknown (HELO ?192.168.0.163?) (notifications@contegix.com@97.85.180.217) by qmail01.contegix.com with ESMTPA; 6 Jul 2009 07:46:50 -0000 Message-ID: <4A51ABC9.80904@contegix.com> Date: Mon, 06 Jul 2009 02:46:17 -0500 From: Contegix Notifications User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: notifications@contegix.com Subject: Contegix Network Incident Report Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Contegix Customer: Please do not reply to this email. If you have any questions, please submit a support request to support@contegix.com. At approximately 11:39 AM on July 2nd, our NOC engineers began to receive several monitor alarms alerting us of a potential network issue. We found our core switches were dropping packets to both internal and external traffic. We began to investigate and found abnormal traffic lights on one of our intrusion prevention systems. At that time, we believed this to be the cause and physically bypassed the units. We quickly determined that this was not the root cause and the problem still persisted. We then began to troubleshoot in our core switching. At approximately 11:59 AM, we determined there was a multicast packet storm on our network. Due to the high number of packets, the CPUs in both core switches reached max capacity which caused packet loss. After further debugging we found that the storm was from a routing protocol (VRRP-E) multicast IP and originating from a specific customer core switch port. The customer connected to this port had had a switch malfunction a few minutes prior to the network issue and we determined this could be the cause. At approximately 12:05 PM, we disabled the customer port and the CPUs on our core switches began to stabilize. Network availability to internal and external destinations were restored, but we found that we still could not reach a few external destinations. Also, traffic was increasing on our network but not at normal utilization. After further troubleshooting, we found that we could not route out Level(3)�s network. Based on our observations and data, we could not determine the reason for the Level(3) issues. At approximately 12:19 PM, we disabled BGP with Level(3). Once this was disabled, our network returned to normal and traffic flowed through to outbound routes correctly. While the issue started when a customer replaced a switch, we do not believe this is the direct cause. We do suspect that it triggered a bug in our core switch software despite all engineered precautions. We are working closely with the hardware manufacturer to determine the exact cause. We will forward any new information on this issue and long-term resolution. In the interim, we have placed a moratorium on adding new customer switching equipment connected to our core switches. In addition, we restored our BGP session with Level(3) once it was determined to be safe. We apologize for any inconvenience this may have created for you or your customers. Our reliable network is one of our great assets, and we place a great deal of emphasis on making sure it is working optimally. As mentioned before, we are working closely with the switch manufacturer to identify and fix this bug to make sure this does not occur again. Sincerely, Contegix Support --- Contegix 900 Walnut Street Suite 700 Saint Louis, MO 63102 Phone: 314.622.6200 ext. 3 Toll Free: 877.4.CONTEGIX ext. 3 Fax: 314.621.4422 E-mail: support@contegix.com Beyond Managed Hosting(r) for Your Enterprise Favorite Linux-Friendly Hosting Company - Linux Journal http://www.contegix.com/linuxjournal