Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8EB5F200C3E for ; Tue, 7 Mar 2017 00:02:45 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8D77F160B84; Mon, 6 Mar 2017 23:02:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6377B160B76 for ; Tue, 7 Mar 2017 00:02:44 +0100 (CET) Received: (qmail 67632 invoked by uid 500); 6 Mar 2017 23:02:43 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 67623 invoked by uid 99); 6 Mar 2017 23:02:43 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Mar 2017 23:02:43 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 7F1F6DFDE4; Mon, 6 Mar 2017 23:02:43 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: elserj@apache.org To: commits@accumulo.apache.org Date: Mon, 06 Mar 2017 23:02:43 -0000 Message-Id: <88cc920203294ca487b4ae7dae3111c7@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [1/2] accumulo-website git commit: Fix the date on the security peformance post archived-at: Mon, 06 Mar 2017 23:02:45 -0000 Repository: accumulo-website Updated Branches: refs/heads/asf-site 803c95d0c -> ccd797a95 refs/heads/master 5679ae1b7 -> fa21326b7 Fix the date on the security peformance post Project: http://git-wip-us.apache.org/repos/asf/accumulo-website/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo-website/commit/fa21326b Tree: http://git-wip-us.apache.org/repos/asf/accumulo-website/tree/fa21326b Diff: http://git-wip-us.apache.org/repos/asf/accumulo-website/diff/fa21326b Branch: refs/heads/master Commit: fa21326b7e556fad386a7301ef90df006f575575 Parents: 5679ae1 Author: Josh Elser Authored: Mon Mar 6 18:01:55 2017 -0500 Committer: Josh Elser Committed: Mon Mar 6 18:01:55 2017 -0500 ---------------------------------------------------------------------- ...7-02-23-security-performance-implications.md | 173 ------------------- ...7-03-06-security-performance-implications.md | 173 +++++++++++++++++++ 2 files changed, 173 insertions(+), 173 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/fa21326b/_posts/blog/2017-02-23-security-performance-implications.md ---------------------------------------------------------------------- diff --git a/_posts/blog/2017-02-23-security-performance-implications.md b/_posts/blog/2017-02-23-security-performance-implications.md deleted file mode 100644 index 01d2410..0000000 --- a/_posts/blog/2017-02-23-security-performance-implications.md +++ /dev/null @@ -1,173 +0,0 @@ - -The purpose of this two part series was to measure the performance impact of -various security configurations on a cluster running Apache Accumulo’s -continuous ingest suite. The tests were performed using Amazon Web -Services (AWS), Hortonworks Data Platform 2.4 and Accumulo 1.7. Each of -the five different security settings in Accumulo 1.7 was tested including -no security, SSL, and SASL with Kerberos authentication for the three quality -of protection levels (auth, auth-int, auth-conf). KDC was MIT. HDFS was -configured to use Kerberos for authentication and had service level -authorization on. Other than that, no other security settings (HTTPS, RPC -protection, data transfer encryption, etc) were enabled. Timely was a -separate, single node HDFS/Zookeeper/Accumulo instance. - -## Intro - -All runs utilized the continuous ingest suite that ships with Accumulo (a -standard method to measure performance in Accumulo). It generates random -graph data and inserts it into Accumulo, creating -a long linked list of entries. Part 1 was run with just continuous ingest. -Based on the test results, there was a measurable performance impact as each additional security configuration was put in place. - -## Methodology - -We ran 5 tests, one for each security configuration. Each iteration of each test inserted 2 billion entries. Batch writers were configured with 500K max mem -to artificially inflate the overall write overhead. This was performed on a -small cluster on AWS. - -Each test used one of the following security configurations: - -* No security - Default -* Two way SSL -* Kerberos/SASL with auth - * auth is just Kerberos authentication between client and server. Each end of the RPC definitively knows who the other is. -* Kerberos/SASL with auth-int - * Builds on auth, also providing message integrity checks of the data going across the wire. You also know that the message you received was not altered. -* Kerberos/SASL with auth-conf - * Builds on auth-int, also providing confidentiality of the message that was sent to prevent others from reading it (aka wire-encryption). - -For each test, five iterations were run to obtain a min, max, and median -time elapsed at each security configuration. After each iteration, -Hadoop, and Zookeeper processes were restarted, Accumulo tables are -wiped clean and tables are recreated. In addition, pagecache, dentries -and inodes are dropped by issuing a ‘3’ command on -/proc/sys/vm/drop\_caches to ensure that the OS is not caching things to disk -that might affect the benchmark. The following sequence was performed -between iterations: - -1. Bring down Accumulo -2. Bring down Zookeeper -3. Bring down Hadoop -4. Run sync command -5. Drop OS cache -6. Bring up Hadoop -7. Bring up Zookeeper -8. Bring up Accumulo -9. Drop tables -10. Create tables - -For each iteration, the results were stored, fed into [Timely](https://nationalsecurityagency.github.io/timely/), and viewed with Grafana. -Since the runs were executed sequentially, the start epochs for each run did not align. -To mitigate, the entries for each run were inserted -with the same relative epoch for convenient comparison in Grafana. - -The table configurations for Accumulo remain the same throughout the -different iterations and security levels. The Accumulo site -configurations differ only due to the different settings for the -security level configurations. - -## Environment - -In order to perform the testing, a small AWS cluster was setup using 14 -hosts on EC2. Two i2.xlarge instances were used as master nodes and eight -d2.xlarge instances were used for workers. In addition, two c4.4xlarge -instances were used for ingesters, one m4.2xlarge instance was used for -Timely, and one m4.xlarge instance was used for Apache Ambari. A logical -diagram of the setup is depicted below: - -![]({{ site.baseurl}}/images/blog/201702_security/figure1.png){:width="400px"} - -Figure 1 - Cluster Layout, Roles, and Instance Types on AWS. - -The types of nodes and their function are given below: - -{: #instance_types .table } -|Node Type|AWS EC2 Type|EC2 Type Details|Quantity| -|:---|:---|:---|:---| -|Ingest Nodes|c4.4xlarge|16 core, 30 GB RAM|2| -|Worker Node|d2.xlarge|4 cores, 30.5 GB RAM, 3x2T GB HD|8| -|Master Node|i2.xlarge|4 cores, 30.5 GB RAM, 1x800GB SSD|2| -|Admin Node|m4.xlarge|4 cores, 16 GB RAM|1| -|Timely Node|m4.2xlarge|8 cores, 32 GB RAM|1| - - -Table 1 – AWS Instance Types, Role, Details, and Quantities - - -## Results - -The median, max, and min of the milliseconds elapsed -time of all iterations for each test is displayed below. The percentage change -columns compare the Median, Max, and Min respectively from the no -security level to each security configuration (e.g. no security Median -vs. auth-int Median, no security Max vs. auth-int Max). - - -{: #results .table } -| Security Level | Median | Standard Deviation | Max | Min | % Change (nosec Median vs. Median) | % Change (nosec Max vs. Max) | % Change (nosec Min vs. Min) | Delta from Previous Level (Median)| -| ---------------- |---------: |---------:|----------:| ---------:| ------------------------------------: |------------------------------:| ------------------------------:| ------------------------------------:| -| no security | 7829394 | 139340 | 8143035| 7764309 | 0.00% | 0.00% | 0.00% | 0.00%| -|ssl | 8292760 | 87012 | 8464060 | 8204955 | 5.92% | 3.94% | 5.68% | 5.92%| -| auth | 8859552 | 134109 | 9047971| 8657618 | 13.16% | 11.11% | 11.51% | 6.83%| -| auth-int | 9500737 | 155968 | 9753424 | 9282371 | 21.34% | 19.78% | 19.55% | 7.24%| -|auth-conf | 9479635 | 170823 | 9776580 | 9282189 | 21.08% | 20.06% | 19.55% | -0.22%| - -Table 2 – Summarized Time Elapsed for Each Security Level - - -## Plots - -Below are some snapshots of *stats.out elements via Grafana that were inserted -into Timely with the same relative start time. Each graph represents a field -in the output generated by [ContinuousStatsCollector](https://github.com/apache/accumulo/blob/1.7/test/src/main/java/org/apache/accumulo/test/continuous/ContinuousStatsCollector.java) - -### [TABLE\_RECS](https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/master/thrift/TableInfo.java#L73) -(Number of records in the continuous ingest table. Down sample=1m, aggregate=avg) - -[![]({{site.baseurl}}/images/blog/201702_security/tableRecs.png){:width="800px"}]({{site.baseurl}}/images/blog/201702_security/tableRecs.png) - -### [TOTAL\_INGEST](https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/master/thrift/TableInfo.java#L77) -(Ingest rate for Accumulo instance. Down sample=5m, aggregate=avg) - -[![]({{ site.baseurl}}/images/blog/201702_security/totalIngest.png){:width="800px"}]({{ site.baseurl}}/images/blog/201702_security/totalIngest.png) - -### [AVG\_FILES/TABLET](https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/util/Stat.java#L63) -(Average number of files per Accumulo tablet. Down sample=1m, aggregate=avg) - -[![]({{ site.baseurl}}/images/blog/201702_security/avgFilesTab.png){:width="800px"}]({{ site.baseurl}}/images/blog/201702_security/avgFilesTab.png) - -### [ACCUMULO\_FILES](https://github.com/apache/accumulo/blob/1.7/test/src/main/java/org/apache/accumulo/test/continuous/ContinuousStatsCollector.java#L127) -(Total number of files for Accumulo. Down sample=1m, aggregate=avg) - -[![]({{ site.baseurl}}/images/blog/201702_security/accumuloFiles.png){:width="800px"}]({{ site.baseurl}}/images/blog/201702_security/accumuloFiles.png) - - -As can be seen in the plots above, the different security settings have -relatively consistent, discernable median run characteristics. The big -dip in each TOTAL_INGEST coincides with a large number of major -compactions, a rate decrease for TABLE_RECS, and a decrease in -AVG_FILES/TABLET. - - -## Final Thoughts - -The biggest performance -hits to run duration median (compared to default security) were ~21% for -auth-int and auth-conf. Interesting to note that SSL's median run duration was -lower than all SASL configs and that auth-conf's was lower than auth-int. -Initial speculation for these oddities revolved around the -[Thrift server](https://github.com/m1ch1/mapkeeper/wiki/Thrift-Java-Servers-Compared) -implementations, but the Thrift differences will not explain the auth-conf/int -disparity since both utilize TThreadPoolServer. It was certainly unexpected that the -addition of wire encryption would yield a faster median run duration. This result -prompted, as a sanity check, sniffing the net traffic (in a contrived example -not during a timed run) in both auth-conf and auth-int to ensure that the message -contents were actually obfuscated in auth-conf (they were) and not obfuscated in -auth-int (they weren't). - - -## Future Work - -Part 2 of this series will consist of the same continuous ingest loads and -configurations with the addition of a query load on the system. - http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/fa21326b/_posts/blog/2017-03-06-security-performance-implications.md ---------------------------------------------------------------------- diff --git a/_posts/blog/2017-03-06-security-performance-implications.md b/_posts/blog/2017-03-06-security-performance-implications.md new file mode 100644 index 0000000..01d2410 --- /dev/null +++ b/_posts/blog/2017-03-06-security-performance-implications.md @@ -0,0 +1,173 @@ + +The purpose of this two part series was to measure the performance impact of +various security configurations on a cluster running Apache Accumulo’s +continuous ingest suite. The tests were performed using Amazon Web +Services (AWS), Hortonworks Data Platform 2.4 and Accumulo 1.7. Each of +the five different security settings in Accumulo 1.7 was tested including +no security, SSL, and SASL with Kerberos authentication for the three quality +of protection levels (auth, auth-int, auth-conf). KDC was MIT. HDFS was +configured to use Kerberos for authentication and had service level +authorization on. Other than that, no other security settings (HTTPS, RPC +protection, data transfer encryption, etc) were enabled. Timely was a +separate, single node HDFS/Zookeeper/Accumulo instance. + +## Intro + +All runs utilized the continuous ingest suite that ships with Accumulo (a +standard method to measure performance in Accumulo). It generates random +graph data and inserts it into Accumulo, creating +a long linked list of entries. Part 1 was run with just continuous ingest. +Based on the test results, there was a measurable performance impact as each additional security configuration was put in place. + +## Methodology + +We ran 5 tests, one for each security configuration. Each iteration of each test inserted 2 billion entries. Batch writers were configured with 500K max mem +to artificially inflate the overall write overhead. This was performed on a +small cluster on AWS. + +Each test used one of the following security configurations: + +* No security - Default +* Two way SSL +* Kerberos/SASL with auth + * auth is just Kerberos authentication between client and server. Each end of the RPC definitively knows who the other is. +* Kerberos/SASL with auth-int + * Builds on auth, also providing message integrity checks of the data going across the wire. You also know that the message you received was not altered. +* Kerberos/SASL with auth-conf + * Builds on auth-int, also providing confidentiality of the message that was sent to prevent others from reading it (aka wire-encryption). + +For each test, five iterations were run to obtain a min, max, and median +time elapsed at each security configuration. After each iteration, +Hadoop, and Zookeeper processes were restarted, Accumulo tables are +wiped clean and tables are recreated. In addition, pagecache, dentries +and inodes are dropped by issuing a ‘3’ command on +/proc/sys/vm/drop\_caches to ensure that the OS is not caching things to disk +that might affect the benchmark. The following sequence was performed +between iterations: + +1. Bring down Accumulo +2. Bring down Zookeeper +3. Bring down Hadoop +4. Run sync command +5. Drop OS cache +6. Bring up Hadoop +7. Bring up Zookeeper +8. Bring up Accumulo +9. Drop tables +10. Create tables + +For each iteration, the results were stored, fed into [Timely](https://nationalsecurityagency.github.io/timely/), and viewed with Grafana. +Since the runs were executed sequentially, the start epochs for each run did not align. +To mitigate, the entries for each run were inserted +with the same relative epoch for convenient comparison in Grafana. + +The table configurations for Accumulo remain the same throughout the +different iterations and security levels. The Accumulo site +configurations differ only due to the different settings for the +security level configurations. + +## Environment + +In order to perform the testing, a small AWS cluster was setup using 14 +hosts on EC2. Two i2.xlarge instances were used as master nodes and eight +d2.xlarge instances were used for workers. In addition, two c4.4xlarge +instances were used for ingesters, one m4.2xlarge instance was used for +Timely, and one m4.xlarge instance was used for Apache Ambari. A logical +diagram of the setup is depicted below: + +![]({{ site.baseurl}}/images/blog/201702_security/figure1.png){:width="400px"} + +Figure 1 - Cluster Layout, Roles, and Instance Types on AWS. + +The types of nodes and their function are given below: + +{: #instance_types .table } +|Node Type|AWS EC2 Type|EC2 Type Details|Quantity| +|:---|:---|:---|:---| +|Ingest Nodes|c4.4xlarge|16 core, 30 GB RAM|2| +|Worker Node|d2.xlarge|4 cores, 30.5 GB RAM, 3x2T GB HD|8| +|Master Node|i2.xlarge|4 cores, 30.5 GB RAM, 1x800GB SSD|2| +|Admin Node|m4.xlarge|4 cores, 16 GB RAM|1| +|Timely Node|m4.2xlarge|8 cores, 32 GB RAM|1| + + +Table 1 – AWS Instance Types, Role, Details, and Quantities + + +## Results + +The median, max, and min of the milliseconds elapsed +time of all iterations for each test is displayed below. The percentage change +columns compare the Median, Max, and Min respectively from the no +security level to each security configuration (e.g. no security Median +vs. auth-int Median, no security Max vs. auth-int Max). + + +{: #results .table } +| Security Level | Median | Standard Deviation | Max | Min | % Change (nosec Median vs. Median) | % Change (nosec Max vs. Max) | % Change (nosec Min vs. Min) | Delta from Previous Level (Median)| +| ---------------- |---------: |---------:|----------:| ---------:| ------------------------------------: |------------------------------:| ------------------------------:| ------------------------------------:| +| no security | 7829394 | 139340 | 8143035| 7764309 | 0.00% | 0.00% | 0.00% | 0.00%| +|ssl | 8292760 | 87012 | 8464060 | 8204955 | 5.92% | 3.94% | 5.68% | 5.92%| +| auth | 8859552 | 134109 | 9047971| 8657618 | 13.16% | 11.11% | 11.51% | 6.83%| +| auth-int | 9500737 | 155968 | 9753424 | 9282371 | 21.34% | 19.78% | 19.55% | 7.24%| +|auth-conf | 9479635 | 170823 | 9776580 | 9282189 | 21.08% | 20.06% | 19.55% | -0.22%| + +Table 2 – Summarized Time Elapsed for Each Security Level + + +## Plots + +Below are some snapshots of *stats.out elements via Grafana that were inserted +into Timely with the same relative start time. Each graph represents a field +in the output generated by [ContinuousStatsCollector](https://github.com/apache/accumulo/blob/1.7/test/src/main/java/org/apache/accumulo/test/continuous/ContinuousStatsCollector.java) + +### [TABLE\_RECS](https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/master/thrift/TableInfo.java#L73) +(Number of records in the continuous ingest table. Down sample=1m, aggregate=avg) + +[![]({{site.baseurl}}/images/blog/201702_security/tableRecs.png){:width="800px"}]({{site.baseurl}}/images/blog/201702_security/tableRecs.png) + +### [TOTAL\_INGEST](https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/master/thrift/TableInfo.java#L77) +(Ingest rate for Accumulo instance. Down sample=5m, aggregate=avg) + +[![]({{ site.baseurl}}/images/blog/201702_security/totalIngest.png){:width="800px"}]({{ site.baseurl}}/images/blog/201702_security/totalIngest.png) + +### [AVG\_FILES/TABLET](https://github.com/apache/accumulo/blob/1.7/core/src/main/java/org/apache/accumulo/core/util/Stat.java#L63) +(Average number of files per Accumulo tablet. Down sample=1m, aggregate=avg) + +[![]({{ site.baseurl}}/images/blog/201702_security/avgFilesTab.png){:width="800px"}]({{ site.baseurl}}/images/blog/201702_security/avgFilesTab.png) + +### [ACCUMULO\_FILES](https://github.com/apache/accumulo/blob/1.7/test/src/main/java/org/apache/accumulo/test/continuous/ContinuousStatsCollector.java#L127) +(Total number of files for Accumulo. Down sample=1m, aggregate=avg) + +[![]({{ site.baseurl}}/images/blog/201702_security/accumuloFiles.png){:width="800px"}]({{ site.baseurl}}/images/blog/201702_security/accumuloFiles.png) + + +As can be seen in the plots above, the different security settings have +relatively consistent, discernable median run characteristics. The big +dip in each TOTAL_INGEST coincides with a large number of major +compactions, a rate decrease for TABLE_RECS, and a decrease in +AVG_FILES/TABLET. + + +## Final Thoughts + +The biggest performance +hits to run duration median (compared to default security) were ~21% for +auth-int and auth-conf. Interesting to note that SSL's median run duration was +lower than all SASL configs and that auth-conf's was lower than auth-int. +Initial speculation for these oddities revolved around the +[Thrift server](https://github.com/m1ch1/mapkeeper/wiki/Thrift-Java-Servers-Compared) +implementations, but the Thrift differences will not explain the auth-conf/int +disparity since both utilize TThreadPoolServer. It was certainly unexpected that the +addition of wire encryption would yield a faster median run duration. This result +prompted, as a sanity check, sniffing the net traffic (in a contrived example +not during a timed run) in both auth-conf and auth-int to ensure that the message +contents were actually obfuscated in auth-conf (they were) and not obfuscated in +auth-int (they weren't). + + +## Future Work + +Part 2 of this series will consist of the same continuous ingest loads and +configurations with the addition of a query load on the system. +