Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 604488723 for ; Fri, 19 Aug 2011 00:48:51 +0000 (UTC) Received: (qmail 57290 invoked by uid 500); 19 Aug 2011 00:48:51 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 57252 invoked by uid 500); 19 Aug 2011 00:48:50 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 57067 invoked by uid 99); 19 Aug 2011 00:48:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2011 00:48:50 +0000 X-ASF-Spam-Status: No, hits=-2001.1 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Aug 2011 00:48:49 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 43A4FC4053 for ; Fri, 19 Aug 2011 00:48:29 +0000 (UTC) Date: Fri, 19 Aug 2011 00:48:29 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <949446583.51410.1313714909273.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-7550) Need for Integrity Validation of RPC MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087431#comment-13087431 ] Allen Wittenauer commented on HADOOP-7550: ------------------------------------------ >From what I remember, krb5 vs krb5i was like 5-10% perf degradation. krb5p was like another 5%. I'd expect going from nothing to krb5i or krb5p to be fairly horrific. On the plus side, these are already implemented, known quantities, etc. With hardware accelerated crypto now common, the numbers are likely lower for anyone using anything relatively modern on non-Intel gear. For Intel-gear, enabling AES support would probably help. > Need for Integrity Validation of RPC > ------------------------------------ > > Key: HADOOP-7550 > URL: https://issues.apache.org/jira/browse/HADOOP-7550 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc > Reporter: Dave Thompson > Assignee: Dave Thompson > > Some recent investigation of network packet corruption has shown a need for hadoop RPC integrity validation beyond assurances already provided by 802.3 link layer and TCP 16-bit CRC. > During an unusual occurrence on a 4k node cluster, we've seen as high as 4 TCP anomalies per second on a single node, sustained over an hour (14k per hour). A TCP anomaly would be an escaped link layer packet that resulted in a TCP CRC failure, TCP packet out of sequence > or TCP packet size error. > According to this paper[*]: http://tinyurl.com/3aue72r > TCP's 16-bit CRC has an effective detection rate of 2^10. 1 in 1024 errors may escape detection, and in fact what originally alerted us to this issue was seeing failures due to bit-errors in hadoop traffic. Extrapolating from that paper, one might expect 14 escaped packet errors per hour for that single node of a 4k cluster. While the above error rate > was unusually high due to a broadband aggregate switch issue, hadoop not having an integrity check on RPC makes it problematic to discover, and limit any potential data damage due to > acting on a corrupt RPC message. > ------ > [*] In case this jira outlives that tinyurl, the IEEE paper cited is: "Performance of Checksums and CRCs over Real Data" by Jonathan Stone, Michael Greenwald, Craig Partridge, Jim Hughes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira