drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [09/19] drill git commit: yaml date
Date Tue, 29 Nov 2016 22:42:47 GMT
yaml date


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/a0608abe
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/a0608abe
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/a0608abe

Branch: refs/heads/gh-pages
Commit: a0608abedf54bbee5b0f236c07eaef494b2fb273
Parents: 0020c69
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Mon Nov 21 14:28:39 2016 -0800
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Mon Nov 21 14:28:39 2016 -0800

----------------------------------------------------------------------
 .../010-query-plans.md                          | 150 +++++-----
 .../020-query-profiles.md                       | 286 +++++++++----------
 .../010-query-profile-tables.md                 | 170 +++++------
 .../010-query-plans-and-tuning-introduction.md  |  16 +-
 .../020-join-planning-guidelines.md             |  88 +++---
 ...030-guidelines-for-optimizing-aggregation.md |  44 +--
 .../040-modifying-query-planning-options.md     |  62 ++--
 .../060-enabling-query-queuing.md               |  98 +++----
 ...to-balance-performance-with-multi-tenancy.md |  22 +-
 9 files changed, 468 insertions(+), 468 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/identifying-performance-issues/010-query-plans.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/identifying-performance-issues/010-query-plans.md b/_docs/performance-tuning/identifying-performance-issues/010-query-plans.md
index cddbc49..9efb9a3 100644
--- a/_docs/performance-tuning/identifying-performance-issues/010-query-plans.md
+++ b/_docs/performance-tuning/identifying-performance-issues/010-query-plans.md
@@ -1,75 +1,75 @@
----
-title: "Query Plans"
-date:  
-parent: "Identifying Performance Issues"
----
-If you experience performance issues in Drill, you can typically identify the source of the issues in the query plans or profiles. This section describes the logical plan and physical plans.
-
-## Query Plans  
-
-Drill has an optimizer and a parallelizer that work together to plan a query. Drill creates logical, physical, and execution plans based on the available statistics for an associated set of files or data sources. The number of running Drill nodes and configured runtime settings contribute to how Drill plans and executes a query.
- 
-You can use [EXPLAIN commands]({{ site.baseurl }}/docs/explain-commands/) to view the logical and physical plans for a query, however you cannot view the execution plan. To see how Drill executed a query, you can view the query profile in the Drill Web Console at `<drill_node_ip_address>:8047`.
-
-### Logical Plan  
-
-A logical plan is a collection of logical operators that describe the work required to generate query results and define which data sources and operators to apply. The parser in Drill converts SQL operators into a logical operator syntax that Drill understands to create the logical plan. You can view the logical plan to see the planned operators. Modifying and resubmitting the logical plan to Drill (through submit_plan) is not very useful because Drill has not determined parallelization at this stage of planning.
-
-### Physical Plan  
-
-A physical plan describes the chosen physical execution plan for a query statement. The optimizer applies various types of rules to rearrange operators and functions into an optimal plan and then converts the logical plan into a physical plan that tells Drill how to execute the query.
- 
-You can review a physical plan to troubleshoot issues, modify the plan, and then submit the plan back to Drill. For example, if you run into a casting error or you want to change the join ordering of tables to see if the query runs faster. You can modify the physical plan to address the issue and then submit it back to Drill and run the query.
- 
-Drill transforms the physical plan into an execution tree of minor fragments that run simultaneously on the cluster to carry out execution tasks. See Query Execution. You can view the activity of the fragments that executed a query in the query profile. See Query Profiles.
-
-**Viewing the Physical Plan**  
-
-You can run the EXPLAIN command to view the physical plan for a query with or without costing formation. See EXPLAIN for Physical Plans and Costing Information. Analyze the cost-based query plan to identify the types of operators that Drill plans to use for the query and how much memory they will require. 
-
-Read the text output from bottom to top to understand the sequence of operators planned to execute the query. You can also view a visual representation of the physical plan in the Profile view of the Drill Web Console. See Query Profiles. You can modify the detailed JSON output, and submit it back to Drill through the Drill Web Console.
-
-The physical plan shows the major fragments and specific operators with correlating MajorFragmentIDs and OperatorIDs. See Operators. Major fragments are an abstract concept that represent a phase of the query execution. Major fragments do not perform any query tasks.
- 
-The physical plan displays the IDs in the following format:
- 
-`<MajorFragmentID> - <OperatorID>`
- 
-For example, 00-02 where 00 is the MajorFragmentID and 02 is is the OperatorID.
- 
-If you view the plan with costing information, you can see where the majority of resources, in terms of I/O, CPU, and memory, will be spent when Drill executes the query. If joining tables, your query plan should include broadcast joins.
-
-**Example EXPLAIN PLAN**
-  
-
-       0: jdbc:drill:zk=local> explain plan for select type t, count(distinct id) from dfs.`/home/donuts/donuts.json` where type='donut' group by type;
-       +------------+------------+
-       |   text    |   json    |
-       +------------+------------+
-       | 00-00 Screen
-       00-01   Project(t=[$0], EXPR$1=[$1])
-       00-02       Project(t=[$0], EXPR$1=[$1])
-       00-03       HashAgg(group=[{0}], EXPR$1=[COUNT($1)])
-       00-04           HashAgg(group=[{0, 1}])
-       00-05           SelectionVectorRemover
-       00-06               Filter(condition=[=($0, 'donut')])
-       00-07               Scan(groupscan=[EasyGroupScan [selectionRoot=/home/donuts/donuts.json, numFiles=1, columns=[`type`, `id`], files=[file:/home/donuts/donuts.json]]])...
-       …
-       
-         
-**Modifying and Submitting a Physical Plan to Drill**
-
-You can test the performance of a physical plan that Drill generates, modify the plan and then re-submit it to Drill. For example, you can modify the plan to change the join ordering of tables. You can also submit physical plans created outside of Drill through the Drill Web Console.
- 
-**Note:** Only advanced users who know about query planning should modify and re-submit a physical plan.
- 
-To modify and re-submit a physical plan to Drill, complete the following steps:  
-
-1. Run EXPLAIN PLAN FOR `<query>` to see the physical plan for your query.  
-2. Copy the JSON output of the physical plan, and modify as needed.  
-3. Navigate to the Drill Web Console at `<drill_node_ip_address>:8047`.  
-4. Select **Query** in the menu bar.  
-![]({{ site.baseurl }}/docs/img/submit_plan.png)  
-
-5. Select the **Physical Plan** radio button under Query Type.  
-6. Paste the physical plan into the Query field, and click **Submit**. Drill runs the plan and executes the query.
+---
+title: "Query Plans"
+date: 2016-11-21 22:28:40 UTC
+parent: "Identifying Performance Issues"
+---
+If you experience performance issues in Drill, you can typically identify the source of the issues in the query plans or profiles. This section describes the logical plan and physical plans.
+
+## Query Plans  
+
+Drill has an optimizer and a parallelizer that work together to plan a query. Drill creates logical, physical, and execution plans based on the available statistics for an associated set of files or data sources. The number of running Drill nodes and configured runtime settings contribute to how Drill plans and executes a query.
+ 
+You can use [EXPLAIN commands]({{ site.baseurl }}/docs/explain-commands/) to view the logical and physical plans for a query, however you cannot view the execution plan. To see how Drill executed a query, you can view the query profile in the Drill Web Console at `<drill_node_ip_address>:8047`.
+
+### Logical Plan  
+
+A logical plan is a collection of logical operators that describe the work required to generate query results and define which data sources and operators to apply. The parser in Drill converts SQL operators into a logical operator syntax that Drill understands to create the logical plan. You can view the logical plan to see the planned operators. Modifying and resubmitting the logical plan to Drill (through submit_plan) is not very useful because Drill has not determined parallelization at this stage of planning.
+
+### Physical Plan  
+
+A physical plan describes the chosen physical execution plan for a query statement. The optimizer applies various types of rules to rearrange operators and functions into an optimal plan and then converts the logical plan into a physical plan that tells Drill how to execute the query.
+ 
+You can review a physical plan to troubleshoot issues, modify the plan, and then submit the plan back to Drill. For example, if you run into a casting error or you want to change the join ordering of tables to see if the query runs faster. You can modify the physical plan to address the issue and then submit it back to Drill and run the query.
+ 
+Drill transforms the physical plan into an execution tree of minor fragments that run simultaneously on the cluster to carry out execution tasks. See Query Execution. You can view the activity of the fragments that executed a query in the query profile. See Query Profiles.
+
+**Viewing the Physical Plan**  
+
+You can run the EXPLAIN command to view the physical plan for a query with or without costing formation. See EXPLAIN for Physical Plans and Costing Information. Analyze the cost-based query plan to identify the types of operators that Drill plans to use for the query and how much memory they will require. 
+
+Read the text output from bottom to top to understand the sequence of operators planned to execute the query. You can also view a visual representation of the physical plan in the Profile view of the Drill Web Console. See Query Profiles. You can modify the detailed JSON output, and submit it back to Drill through the Drill Web Console.
+
+The physical plan shows the major fragments and specific operators with correlating MajorFragmentIDs and OperatorIDs. See Operators. Major fragments are an abstract concept that represent a phase of the query execution. Major fragments do not perform any query tasks.
+ 
+The physical plan displays the IDs in the following format:
+ 
+`<MajorFragmentID> - <OperatorID>`
+ 
+For example, 00-02 where 00 is the MajorFragmentID and 02 is is the OperatorID.
+ 
+If you view the plan with costing information, you can see where the majority of resources, in terms of I/O, CPU, and memory, will be spent when Drill executes the query. If joining tables, your query plan should include broadcast joins.
+
+**Example EXPLAIN PLAN**
+  
+
+       0: jdbc:drill:zk=local> explain plan for select type t, count(distinct id) from dfs.`/home/donuts/donuts.json` where type='donut' group by type;
+       +------------+------------+
+       |   text    |   json    |
+       +------------+------------+
+       | 00-00 Screen
+       00-01   Project(t=[$0], EXPR$1=[$1])
+       00-02       Project(t=[$0], EXPR$1=[$1])
+       00-03       HashAgg(group=[{0}], EXPR$1=[COUNT($1)])
+       00-04           HashAgg(group=[{0, 1}])
+       00-05           SelectionVectorRemover
+       00-06               Filter(condition=[=($0, 'donut')])
+       00-07               Scan(groupscan=[EasyGroupScan [selectionRoot=/home/donuts/donuts.json, numFiles=1, columns=[`type`, `id`], files=[file:/home/donuts/donuts.json]]])...
+       …
+       
+         
+**Modifying and Submitting a Physical Plan to Drill**
+
+You can test the performance of a physical plan that Drill generates, modify the plan and then re-submit it to Drill. For example, you can modify the plan to change the join ordering of tables. You can also submit physical plans created outside of Drill through the Drill Web Console.
+ 
+**Note:** Only advanced users who know about query planning should modify and re-submit a physical plan.
+ 
+To modify and re-submit a physical plan to Drill, complete the following steps:  
+
+1. Run EXPLAIN PLAN FOR `<query>` to see the physical plan for your query.  
+2. Copy the JSON output of the physical plan, and modify as needed.  
+3. Navigate to the Drill Web Console at `<drill_node_ip_address>:8047`.  
+4. Select **Query** in the menu bar.  
+![]({{ site.baseurl }}/docs/img/submit_plan.png)  
+
+5. Select the **Physical Plan** radio button under Query Type.  
+6. Paste the physical plan into the Query field, and click **Submit**. Drill runs the plan and executes the query.

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/identifying-performance-issues/020-query-profiles.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/identifying-performance-issues/020-query-profiles.md b/_docs/performance-tuning/identifying-performance-issues/020-query-profiles.md
index ad7a948..5a27da7 100644
--- a/_docs/performance-tuning/identifying-performance-issues/020-query-profiles.md
+++ b/_docs/performance-tuning/identifying-performance-issues/020-query-profiles.md
@@ -1,143 +1,143 @@
----
-title: "Query Profiles"
-date:  
-parent: "Identifying Performance Issues"
----
-
-A profile is a summary of metrics collected for each query that Drill executes. Query profiles provide information that you can use to monitor and analyze query performance. Drill creates a query profile from major, minor, operator, and input stream profiles. Each major fragment profile consists of a list of minor fragment profiles. Each minor fragment profile consists of a list of operator profiles. An operator profile consists of a list of input stream profiles. 
-
-You can view aggregate statistics across profile lists in the Profile tab of the Drill Web Console at `<drill_node_ip_address>:8047`. You can modify and resubmit queries, or cancel queries. For debugging purposes, you can use profiles in conjunction with Drill logs. See Log and Debug.
- 
-Metrics in a query profile are associated with a coordinate system of IDs. Drill uses a coordinate system comprised of query, fragment, and operator identifiers to track query execution activities and resources. Drill assigns a unique QueryID to each query received and then assigns IDs to each fragment and operator that executes the query.
- 
-**Example IDs**
-
-QueryID: 2aa98add-15b3-e155-5669-603c03bfde86
- 
-Fragment and operator IDs:  
-
-![]({{ site.baseurl }}/docs/img/xx-xx-xx.png)  
-
-## Viewing a Query Profile  
-
-When you select the Profiles tab in the Drill Web Console at `<drill_node_ip_address>:8047`, you see a list of the last 100 queries than have run or that are currently running in the cluster.  
-
-![]({{ site.baseurl }}/docs/img/list_queries.png)
-
-
-You can click on any query to see its profile.  
-
-![]({{ site.baseurl }}/docs/img/query_profile.png)  
-
-When you select a profile, notice that the URL in the address bar contains the QueryID. For example, 2aa98add-15b3-e155-5669-603c03bfde86 in the following URL:
-
-       http://<drill_node>:8047/profiles/2aa98add-15b3-e155-5669-603c03bfde86
- 
-The Query Profile section in the Query profile summarizes a few key details about the query, including: 
- 
- * The state of the query, either running, completed, or failed.  
- * The node operating as the Foreman; the Drillbit that receives a query from the client or application becomes the Foreman and drives the entire query. 
- * The total number of minor fragments required to execute the query
-
-If you scroll down, you can see the Fragment Profiles and Operator Profiles sections. 
- 
-## Fragment Profiles  
-
-Fragment profiles section provides an overview table, and a major fragment block for each major fragment that executed the query. Each row in the Overview table provides the number of minor fragments that Drill parallelized from each major fragment, as well as aggregate time and memory metrics for the minor fragments.  
-
-![]({{ site.baseurl }}/docs/img/frag_profile.png)  
-
-See Major Fragment Profiles Table for column descriptions.
- 
-When you look at the fragment profiles, you may notice that some major fragments were parallelized into substantially fewer minor fragments, but happen to have the highest runtime.  Or, you may notice certain minor fragments have a higher peak memory than others. When you notice these variations in execution, you can delve deeper into the profile by looking at the major fragment blocks.
- 
-Below the Overview table are major fragment blocks. Each of these blocks corresponds to a row in the Overview table. You can expand the blocks to see metrics for all of the minor fragments that were parallelized from each major fragment, including the host on which each minor fragment ran. Each row in the major fragment table presents the fragment state, time metrics, memory metrics, and aggregate input metrics of each minor fragment.  
-
-![]({{ site.baseurl }}/docs/img/maj_frag_block.png)  
-
-When looking at the minor fragment metrics, verify the state of the fragment. A fragment can have a “failed” state which could indicate an issue on the host. If the query itself fails, an operator may have run out of memory. If fragments running on a particular node are under performing, there may be multi-tenancy issues that you can address.
- 
-You can also see a graph that illustrates the activity of major and minor fragments for the duration of the query.  
-
-![]({{ site.baseurl }}/docs/img/graph_1.png)  
-
-If you see “stair steps” in the graph, this indicates that the execution work of the fragments is not distributed evenly. Stair steps in the graph typically occur for non-local reads on data. To address this issue, you can increase data replication, rewrite the data, or file a JIRA to get help with the issue.
- 
-This graph correlates with the visualized plan graph in the Visualized Plan tab. Each color in the graph corresponds to the activity of one major fragment.  
-
-![]({{ site.baseurl }}/docs/img/vis_graph.png)  
-
-The visualized plan illustrates color-coded major fragments divided and labeled with the names of the operators used to complete each phase of the query. Exchange operators separate each major fragment. These operators represent a point where Drill can execute operations below them in parallel.  
-
-## Operator Profiles  
-
-Operator profiles describe each operator that performed relational operations during query execution. The Operator Profiles section provides an Overview table of the aggregate time and memory metrics for each operator within a major fragment.  
-
-![]({{ site.baseurl }}/docs/img/operator_table.png)  
-
-See Operator Profiles Table for column descriptions.
- 
-Identify the operations that consume a majority of time and memory. You can potentially modify options related to the specific operators to improve performance.
- 
-Below the Overview table are operator blocks, which you can expand to see metrics for each operator. Each of these blocks corresponds to a row in the Overview table. Each row in the Operator block presents time and memory metrics, as well as aggregate input metrics for each minor fragment.  
-
-![]({{ site.baseurl }}/docs/img/operator_block.png)  
-
-See Operator Block for column descriptions.
- 
-Drill uses batches of records as a basic unit of work. The batches are pipelined between each operation.  Record batches are no larger than 64k records. While the target size of one record batch is generally 256k, they can scale to many megabytes depending on the query plan and the width of the records.
-
-The Max Records number for each minor fragment should be almost equivalent. If one, or a very small number of minor fragments, perform the majority of the work, there may be data skew. To address data skew, you may need change settings related to table joins or partition data to balance the work.  
-
-### Data Skew Example
-The following query was run against TPC-DS data:
-
-       0: jdbc:drill:zk=local> select ss_customer_sk, count(*) as cnt from store_sales where ss_customer_sk is null or ss_customer_sk in (1, 2, 3, 4, 5) group by ss_customer_sk;
-       +-----------------+---------+
-       | ss_customer_sk  |   cnt   |
-       +-----------------+---------+
-       | null            | 129752  |
-       | 5               | 47      |
-       | 1               | 9       |
-       | 2               | 43      |
-       | 4               | 10      |
-       | 3               | 11      |
-       +-----------------+---------+
-       6 rows selected
- 
-In the result set, notice that the 'null' group has 129752 values while others have roughly similar values.  
-
-Looking at the operator profile for the hash aggregate in major fragment 00, you can see that out of 8 minor fragments, only minor fragment 1 is processing a substantially larger number of records when compared to the other minor fragments.  
-
-![]({{ site.baseurl }}/docs/img/data_skew.png)  
-
-In this example, there is inherent skew present in the data. Other types of skew may not strictly be data dependent, but can be introduced by a sub-optimal hash function or other issues in the product. In either case, examining the query profile helps understand why a query is slow. In the first scenario, it may be possible to run separate queries for the skewed and non-skewed values. In the second scenario, it is better to seek technical support.  
-
-## Physical Plan View  
-
-The physical plan view provides statistics about the actual cost of the query operations in terms of memory, I/O, and CPU processing. You can use this profile to identify which operations consumed the majority of the resources during a query, modify the physical plan to address the cost-intensive operations, and submit the updated plan back to Drill. See [Costing Information]({{ site.baseurl }}/docs/explain/#costing-information).  
-
-![]({{ site.baseurl }}/docs/img/phys_plan_profile.png)  
-
-## Canceling a Query  
-
-You may want to cancel a query if it hangs or causes performance bottlenecks. You can cancel a query in the Profile tab of the Drill Web Console.
- 
-To cancel a query from the Drill Web Console, complete the following steps:  
-
-1. Navigate to the Drill Web Console at `<drill_node_ip_address>:8047`.
-The Drill node from which you access the Drill Web Console must have an active Drillbit running.
-2. Select Profiles in the toolbar.
-A list of running and completed queries appears.
-3. Click the query for which you want to see the profile.
-4. Select **Edit Query**.
-5. Click **Cancel** query to cancel the query.  
-
-The following message appears:  
-
-       Cancelled query <QueryID\>
-
-
-
-
-
+---
+title: "Query Profiles"
+date: 2016-11-21 22:28:41 UTC
+parent: "Identifying Performance Issues"
+---
+
+A profile is a summary of metrics collected for each query that Drill executes. Query profiles provide information that you can use to monitor and analyze query performance. Drill creates a query profile from major, minor, operator, and input stream profiles. Each major fragment profile consists of a list of minor fragment profiles. Each minor fragment profile consists of a list of operator profiles. An operator profile consists of a list of input stream profiles. 
+
+You can view aggregate statistics across profile lists in the Profile tab of the Drill Web Console at `<drill_node_ip_address>:8047`. You can modify and resubmit queries, or cancel queries. For debugging purposes, you can use profiles in conjunction with Drill logs. See Log and Debug.
+ 
+Metrics in a query profile are associated with a coordinate system of IDs. Drill uses a coordinate system comprised of query, fragment, and operator identifiers to track query execution activities and resources. Drill assigns a unique QueryID to each query received and then assigns IDs to each fragment and operator that executes the query.
+ 
+**Example IDs**
+
+QueryID: 2aa98add-15b3-e155-5669-603c03bfde86
+ 
+Fragment and operator IDs:  
+
+![]({{ site.baseurl }}/docs/img/xx-xx-xx.png)  
+
+## Viewing a Query Profile  
+
+When you select the Profiles tab in the Drill Web Console at `<drill_node_ip_address>:8047`, you see a list of the last 100 queries than have run or that are currently running in the cluster.  
+
+![]({{ site.baseurl }}/docs/img/list_queries.png)
+
+
+You can click on any query to see its profile.  
+
+![]({{ site.baseurl }}/docs/img/query_profile.png)  
+
+When you select a profile, notice that the URL in the address bar contains the QueryID. For example, 2aa98add-15b3-e155-5669-603c03bfde86 in the following URL:
+
+       http://<drill_node>:8047/profiles/2aa98add-15b3-e155-5669-603c03bfde86
+ 
+The Query Profile section in the Query profile summarizes a few key details about the query, including: 
+ 
+ * The state of the query, either running, completed, or failed.  
+ * The node operating as the Foreman; the Drillbit that receives a query from the client or application becomes the Foreman and drives the entire query. 
+ * The total number of minor fragments required to execute the query
+
+If you scroll down, you can see the Fragment Profiles and Operator Profiles sections. 
+ 
+## Fragment Profiles  
+
+Fragment profiles section provides an overview table, and a major fragment block for each major fragment that executed the query. Each row in the Overview table provides the number of minor fragments that Drill parallelized from each major fragment, as well as aggregate time and memory metrics for the minor fragments.  
+
+![]({{ site.baseurl }}/docs/img/frag_profile.png)  
+
+See Major Fragment Profiles Table for column descriptions.
+ 
+When you look at the fragment profiles, you may notice that some major fragments were parallelized into substantially fewer minor fragments, but happen to have the highest runtime.  Or, you may notice certain minor fragments have a higher peak memory than others. When you notice these variations in execution, you can delve deeper into the profile by looking at the major fragment blocks.
+ 
+Below the Overview table are major fragment blocks. Each of these blocks corresponds to a row in the Overview table. You can expand the blocks to see metrics for all of the minor fragments that were parallelized from each major fragment, including the host on which each minor fragment ran. Each row in the major fragment table presents the fragment state, time metrics, memory metrics, and aggregate input metrics of each minor fragment.  
+
+![]({{ site.baseurl }}/docs/img/maj_frag_block.png)  
+
+When looking at the minor fragment metrics, verify the state of the fragment. A fragment can have a “failed” state which could indicate an issue on the host. If the query itself fails, an operator may have run out of memory. If fragments running on a particular node are under performing, there may be multi-tenancy issues that you can address.
+ 
+You can also see a graph that illustrates the activity of major and minor fragments for the duration of the query.  
+
+![]({{ site.baseurl }}/docs/img/graph_1.png)  
+
+If you see “stair steps” in the graph, this indicates that the execution work of the fragments is not distributed evenly. Stair steps in the graph typically occur for non-local reads on data. To address this issue, you can increase data replication, rewrite the data, or file a JIRA to get help with the issue.
+ 
+This graph correlates with the visualized plan graph in the Visualized Plan tab. Each color in the graph corresponds to the activity of one major fragment.  
+
+![]({{ site.baseurl }}/docs/img/vis_graph.png)  
+
+The visualized plan illustrates color-coded major fragments divided and labeled with the names of the operators used to complete each phase of the query. Exchange operators separate each major fragment. These operators represent a point where Drill can execute operations below them in parallel.  
+
+## Operator Profiles  
+
+Operator profiles describe each operator that performed relational operations during query execution. The Operator Profiles section provides an Overview table of the aggregate time and memory metrics for each operator within a major fragment.  
+
+![]({{ site.baseurl }}/docs/img/operator_table.png)  
+
+See Operator Profiles Table for column descriptions.
+ 
+Identify the operations that consume a majority of time and memory. You can potentially modify options related to the specific operators to improve performance.
+ 
+Below the Overview table are operator blocks, which you can expand to see metrics for each operator. Each of these blocks corresponds to a row in the Overview table. Each row in the Operator block presents time and memory metrics, as well as aggregate input metrics for each minor fragment.  
+
+![]({{ site.baseurl }}/docs/img/operator_block.png)  
+
+See Operator Block for column descriptions.
+ 
+Drill uses batches of records as a basic unit of work. The batches are pipelined between each operation.  Record batches are no larger than 64k records. While the target size of one record batch is generally 256k, they can scale to many megabytes depending on the query plan and the width of the records.
+
+The Max Records number for each minor fragment should be almost equivalent. If one, or a very small number of minor fragments, perform the majority of the work, there may be data skew. To address data skew, you may need change settings related to table joins or partition data to balance the work.  
+
+### Data Skew Example
+The following query was run against TPC-DS data:
+
+       0: jdbc:drill:zk=local> select ss_customer_sk, count(*) as cnt from store_sales where ss_customer_sk is null or ss_customer_sk in (1, 2, 3, 4, 5) group by ss_customer_sk;
+       +-----------------+---------+
+       | ss_customer_sk  |   cnt   |
+       +-----------------+---------+
+       | null            | 129752  |
+       | 5               | 47      |
+       | 1               | 9       |
+       | 2               | 43      |
+       | 4               | 10      |
+       | 3               | 11      |
+       +-----------------+---------+
+       6 rows selected
+ 
+In the result set, notice that the 'null' group has 129752 values while others have roughly similar values.  
+
+Looking at the operator profile for the hash aggregate in major fragment 00, you can see that out of 8 minor fragments, only minor fragment 1 is processing a substantially larger number of records when compared to the other minor fragments.  
+
+![]({{ site.baseurl }}/docs/img/data_skew.png)  
+
+In this example, there is inherent skew present in the data. Other types of skew may not strictly be data dependent, but can be introduced by a sub-optimal hash function or other issues in the product. In either case, examining the query profile helps understand why a query is slow. In the first scenario, it may be possible to run separate queries for the skewed and non-skewed values. In the second scenario, it is better to seek technical support.  
+
+## Physical Plan View  
+
+The physical plan view provides statistics about the actual cost of the query operations in terms of memory, I/O, and CPU processing. You can use this profile to identify which operations consumed the majority of the resources during a query, modify the physical plan to address the cost-intensive operations, and submit the updated plan back to Drill. See [Costing Information]({{ site.baseurl }}/docs/explain/#costing-information).  
+
+![]({{ site.baseurl }}/docs/img/phys_plan_profile.png)  
+
+## Canceling a Query  
+
+You may want to cancel a query if it hangs or causes performance bottlenecks. You can cancel a query in the Profile tab of the Drill Web Console.
+ 
+To cancel a query from the Drill Web Console, complete the following steps:  
+
+1. Navigate to the Drill Web Console at `<drill_node_ip_address>:8047`.
+The Drill node from which you access the Drill Web Console must have an active Drillbit running.
+2. Select Profiles in the toolbar.
+A list of running and completed queries appears.
+3. Click the query for which you want to see the profile.
+4. Select **Edit Query**.
+5. Click **Cancel** query to cancel the query.  
+
+The following message appears:  
+
+       Cancelled query <QueryID\>
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/performance-tuning-reference/010-query-profile-tables.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/performance-tuning-reference/010-query-profile-tables.md b/_docs/performance-tuning/performance-tuning-reference/010-query-profile-tables.md
index c1ea63d..956bf14 100644
--- a/_docs/performance-tuning/performance-tuning-reference/010-query-profile-tables.md
+++ b/_docs/performance-tuning/performance-tuning-reference/010-query-profile-tables.md
@@ -1,85 +1,85 @@
----
-title: "Query Profile Column Descriptions"
-date:  
-parent: "Performance Tuning Reference"
---- 
-
-The following tables provide descriptions listed in each of the tables for a query profile.  
-
-
-## Fragment Overview  Table  
-
-Shows aggregate metrics for each major fragment that executed the query.
-
-The following table lists descriptions for each column in the Fragment Overview  
-table:  
-
-| Column Name               | Description                                                                                                                                                                 |
-|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Major Fragment ID         | The coordinate ID of the major fragment. For example, 03-xx-xx where 03 is the major fragment ID followed by xx-xx, which represents the minor fragment ID and operator ID. |
-| Minor Fragments Reporting | The number of minor fragments that Drill parallelized for the major fragment.                                                                                               |
-| First Start               | The total time before the first minor fragment started its task.                                                                                                            |
-| Last Start                | The total time before the last minor fragment started its task.                                                                                                             |
-| First End                 | The total time for the first minor fragment to finish its task.                                                                                                             |
-| Last End                  | The total time for the last minor fragment to finish its task.                                                                                                              |
-| Min Runtime               | The minimum of the total amount of time spent by minor fragments to complete their tasks.                                                                                   |
-| Avg Runtime               | The average of the total amount of time spent by minor fragments to complete their tasks.                                                                                   |
-| Max Runtime               | The maximum of the total amount of time spent by minor fragments to complete their tasks.                                                                                   |
-| Last Update               | The last time one of the minor fragments sent a status update to the Foreman. Time is shown in 24-hour notation.                                                            |
-| Last Progress             | The last time one of the minor fragments made progress, such as a change in fragment state or read data from disk. Time is shown in 24-hour notation.                       |
-| Max Peak Memory           | The maximum of the peak direct memory allocated to any minor fragment.                                                                                                      |
-
-## Major Fragment Block  
-
-Shows metrics for the minor fragments that were parallelized for each major fragment.  
-
-The following table lists descriptions for each column in a major fragment block:  
-
-| Column Name       | Description                                                                                                                                                                                                        |
-|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Minor Fragment ID | The coordinate ID of the minor fragment that was parallelized from the major fragment. For example, 02-03-xx where 02 is the Major Fragment ID, 03 is the Minor Fragment ID, and xx corresponds to an operator ID. |
-| Host              | The node on which the minor fragment carried out its task.                                                                                                                                                         |
-| Start             | The amount of time passed before the minor fragment started its task.                                                                                                                                              |
-| End               | The amount of time passed before the minor fragment finished its task.                                                                                                                                             |
-| Runtime           | The duration of time for the fragment to complete a task. This value equals the difference between End and Start time.                                                                                             |
-| Max Records       | The maximum number of records consumed by an operator from a single input stream.                                                                                                                                  |
-| Max Batches       | The maximum number of input batches across input streams, operators, and minor fragments.                                                                                                                          |
-| Last Update       | The last time this fragment sent a status update to the Foreman. Time is shown in 24-hour notation.                                                                                                                |
-| Last Progress     | The last time this fragment made progress, such as a change in fragment state or reading data from disk. Time is shown in 24-hour notation.                                                                        |
-| Peak Memory       | The peak direct memory allocated during execution for this minor fragment.                                                                                                                                         |
-| State             | The status of the minor fragment; either finished, running, cancelled, or failed.                                                                                                                                  |
-
-
-## Operator Overview  Table  
-
-Shows aggregate metrics for each operator within a major fragment that performed relational operations during query execution.
- 
-The following table lists descriptions for each column in the Operator Overview table:
-
-| Column Name                                          | Description                                                                                                                                                                                                                   |
-|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Operator ID                                          | The coordinates of an operator that performed an operation during a particular phase of the query. For example, 02-xx-03 where 02 is the Major Fragment ID, xx corresponds to a Minor Fragment ID, and 03 is the Operator ID. |
-| Type                                                 | The operator type. Operators can be of type project, filter, hash join, single sender, or unordered receiver.                                                                                                                 |
-| Min Setup Time, Avg Setup Time, Max Setup Time       | The minimum, average, and maximum amount of time spent by the operator to set up before performing the operation.                                                                                                             |
-| Min Process Time, Avg Process Time, Max Process Time | The minimum, average, and maximum  amount of time spent by the operator to perform the operation.                                                                                                                             |
-| Wait (min, avg, max)                                 | These fields represent the minimum, average,  and maximum cumulative times spent by operators waiting for external resources.                                                                                                 |
-| Avg Peak Memory                                      | Represents the average of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as hash join or sort.                                          |
-| Max Peak Memory                                      | Represents the maximum of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as  hash join or sort.                                         |  
-
-## Operator Block  
-
-Shows time and memory metrics for each operator type within a major fragment.  
-
-The following table provides descriptions for each column presented in the operator block:  
-
-| Column Name    | Description                                                                                                                                                                                              |
-|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Minor Fragment | The coordinate ID of the minor fragment on which the operator ran. For example, 04-03-01 where 04 is the Major Fragment ID, 03 is the Minor Fragment ID, and 01 is the Operator ID.                      |
-| Setup Time     | The amount of time spent by the operator to set up before performing its operation. This includes run-time code generation and opening a file.                                                           |
-| Process Time   | The amount of time spent by the operator to perform its operation.                                                                                                                                       |
-| Wait Time      | The cumulative amount of time spent by an operator waiting for external resources. such as waiting to send records, waiting to receive records, waiting to write to disk, and waiting to read from disk. |
-| Max Batches    | The maximum number of record batches consumed from a single input stream.                                                                                                                                |
-| Max Records    | The maximum number of records consumed from a single input stream.                                                                                                                                       |
-| Peak Memory    | Represents the peak direct memory allocated. Relates to the memory needed by the operators to perform their operations, such as  hash join and sort.                                                     |  
-
-
+---
+title: "Query Profile Column Descriptions"
+date: 2016-11-21 22:28:41 UTC
+parent: "Performance Tuning Reference"
+--- 
+
+The following tables provide descriptions listed in each of the tables for a query profile.  
+
+
+## Fragment Overview  Table  
+
+Shows aggregate metrics for each major fragment that executed the query.
+
+The following table lists descriptions for each column in the Fragment Overview  
+table:  
+
+| Column Name               | Description                                                                                                                                                                 |
+|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Major Fragment ID         | The coordinate ID of the major fragment. For example, 03-xx-xx where 03 is the major fragment ID followed by xx-xx, which represents the minor fragment ID and operator ID. |
+| Minor Fragments Reporting | The number of minor fragments that Drill parallelized for the major fragment.                                                                                               |
+| First Start               | The total time before the first minor fragment started its task.                                                                                                            |
+| Last Start                | The total time before the last minor fragment started its task.                                                                                                             |
+| First End                 | The total time for the first minor fragment to finish its task.                                                                                                             |
+| Last End                  | The total time for the last minor fragment to finish its task.                                                                                                              |
+| Min Runtime               | The minimum of the total amount of time spent by minor fragments to complete their tasks.                                                                                   |
+| Avg Runtime               | The average of the total amount of time spent by minor fragments to complete their tasks.                                                                                   |
+| Max Runtime               | The maximum of the total amount of time spent by minor fragments to complete their tasks.                                                                                   |
+| Last Update               | The last time one of the minor fragments sent a status update to the Foreman. Time is shown in 24-hour notation.                                                            |
+| Last Progress             | The last time one of the minor fragments made progress, such as a change in fragment state or read data from disk. Time is shown in 24-hour notation.                       |
+| Max Peak Memory           | The maximum of the peak direct memory allocated to any minor fragment.                                                                                                      |
+
+## Major Fragment Block  
+
+Shows metrics for the minor fragments that were parallelized for each major fragment.  
+
+The following table lists descriptions for each column in a major fragment block:  
+
+| Column Name       | Description                                                                                                                                                                                                        |
+|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Minor Fragment ID | The coordinate ID of the minor fragment that was parallelized from the major fragment. For example, 02-03-xx where 02 is the Major Fragment ID, 03 is the Minor Fragment ID, and xx corresponds to an operator ID. |
+| Host              | The node on which the minor fragment carried out its task.                                                                                                                                                         |
+| Start             | The amount of time passed before the minor fragment started its task.                                                                                                                                              |
+| End               | The amount of time passed before the minor fragment finished its task.                                                                                                                                             |
+| Runtime           | The duration of time for the fragment to complete a task. This value equals the difference between End and Start time.                                                                                             |
+| Max Records       | The maximum number of records consumed by an operator from a single input stream.                                                                                                                                  |
+| Max Batches       | The maximum number of input batches across input streams, operators, and minor fragments.                                                                                                                          |
+| Last Update       | The last time this fragment sent a status update to the Foreman. Time is shown in 24-hour notation.                                                                                                                |
+| Last Progress     | The last time this fragment made progress, such as a change in fragment state or reading data from disk. Time is shown in 24-hour notation.                                                                        |
+| Peak Memory       | The peak direct memory allocated during execution for this minor fragment.                                                                                                                                         |
+| State             | The status of the minor fragment; either finished, running, cancelled, or failed.                                                                                                                                  |
+
+
+## Operator Overview  Table  
+
+Shows aggregate metrics for each operator within a major fragment that performed relational operations during query execution.
+ 
+The following table lists descriptions for each column in the Operator Overview table:
+
+| Column Name                                          | Description                                                                                                                                                                                                                   |
+|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Operator ID                                          | The coordinates of an operator that performed an operation during a particular phase of the query. For example, 02-xx-03 where 02 is the Major Fragment ID, xx corresponds to a Minor Fragment ID, and 03 is the Operator ID. |
+| Type                                                 | The operator type. Operators can be of type project, filter, hash join, single sender, or unordered receiver.                                                                                                                 |
+| Min Setup Time, Avg Setup Time, Max Setup Time       | The minimum, average, and maximum amount of time spent by the operator to set up before performing the operation.                                                                                                             |
+| Min Process Time, Avg Process Time, Max Process Time | The minimum, average, and maximum  amount of time spent by the operator to perform the operation.                                                                                                                             |
+| Wait (min, avg, max)                                 | These fields represent the minimum, average,  and maximum cumulative times spent by operators waiting for external resources.                                                                                                 |
+| Avg Peak Memory                                      | Represents the average of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as hash join or sort.                                          |
+| Max Peak Memory                                      | Represents the maximum of the peak direct memory allocated across minor fragments. Relates to the memory needed by operators to perform their operations, such as  hash join or sort.                                         |  
+
+## Operator Block  
+
+Shows time and memory metrics for each operator type within a major fragment.  
+
+The following table provides descriptions for each column presented in the operator block:  
+
+| Column Name    | Description                                                                                                                                                                                              |
+|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Minor Fragment | The coordinate ID of the minor fragment on which the operator ran. For example, 04-03-01 where 04 is the Major Fragment ID, 03 is the Minor Fragment ID, and 01 is the Operator ID.                      |
+| Setup Time     | The amount of time spent by the operator to set up before performing its operation. This includes run-time code generation and opening a file.                                                           |
+| Process Time   | The amount of time spent by the operator to perform its operation.                                                                                                                                       |
+| Wait Time      | The cumulative amount of time spent by an operator waiting for external resources. such as waiting to send records, waiting to receive records, waiting to write to disk, and waiting to read from disk. |
+| Max Batches    | The maximum number of record batches consumed from a single input stream.                                                                                                                                |
+| Max Records    | The maximum number of records consumed from a single input stream.                                                                                                                                       |
+| Peak Memory    | Represents the peak direct memory allocated. Relates to the memory needed by the operators to perform their operations, such as  hash join and sort.                                                     |  
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/query-plans-and-tuning/010-query-plans-and-tuning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/010-query-plans-and-tuning-introduction.md b/_docs/performance-tuning/query-plans-and-tuning/010-query-plans-and-tuning-introduction.md
index a6bddd9..771a4a4 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/010-query-plans-and-tuning-introduction.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/010-query-plans-and-tuning-introduction.md
@@ -1,8 +1,8 @@
----
-title: "Query Plans and Tuning Introduction"
-date:  
-parent: "Query Plans and Tuning"
----
-
-You can modify several options that affect how Drill plans a query.  This section describes some options that you can modify to improve performance.  
-
+---
+title: "Query Plans and Tuning Introduction"
+date: 2016-11-21 22:28:42 UTC
+parent: "Query Plans and Tuning"
+---
+
+You can modify several options that affect how Drill plans a query.  This section describes some options that you can modify to improve performance.  
+

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/query-plans-and-tuning/020-join-planning-guidelines.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/020-join-planning-guidelines.md b/_docs/performance-tuning/query-plans-and-tuning/020-join-planning-guidelines.md
index 277a5bb..a9a7aa6 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/020-join-planning-guidelines.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/020-join-planning-guidelines.md
@@ -1,44 +1,44 @@
----
-title: "Join Planning Guidelines"
-date:  
-parent: "Query Plans and Tuning"
---- 
-
-Drill uses distributed and broadcast joins to join tables. You can modify configuration settings in Drill to control how Drill plans joins in a query.
-
-## Distributed Joins
-For a distributed join, both sides of the join are hash distributed using one of the hash-based distribution operators on the join key. See Operators. 
-
-If there are multiple join keys from each table, Drill considers the two following types of plans:  
-1. A plan where data is distributed on all keys.  
-2. A plan where data is distributed on each individual key.  
- 
-For a merge join, Drill sorts both sides of the join after performing the hash distribution. Drill can distribute both sides of a hash join or merge join, but cannot do so for a nested loop join. 
-
-## Broadcast Joins
-In a broadcast join, all of the selected records of one file are broadcast to the file on all other nodes before the join is performed. The inner side of the join is broadcast while the outer side is kept as-is without any re-distribution. The estimated cardinality of the inner child must be below the planner.broadcast_threshold parameter in order to be eligible for broadcast.  Drill can use broadcast joins for hash, merge, and nested loop joins.
- 
-A broadcast join is useful when a large (fact) table is being joined to a relatively smaller (dimension) table. If the fact table is stored as many files in the distributed file system, instead of re-distributing the fact table over the network, it may be substantially cheaper to broadcast the inner side.  However, the broadcast sends the same data to all other nodes in the cluster.  Depending on the size of the cluster and the size of the data, it may not be the most efficient policy in some situations.
- 
-### Broadcast Join Options
-You can increase the size and affinity for Drill to use broadcast joins with the ALTER SYSTEM or ALTER SESSION commands and options. Typically, you set the options at the session level unless you want the setting to persist across all sessions.
-
-The following configuration options in Drill control broadcast join behavior:  
-
-* **planner.broadcast_factor** 
-
-     Controls the cost of doing a broadcast when performing a join.  The lower the setting, the cheaper it is to do a broadcast join compared to other types of distribution for a join, such as a hash distribution.  
-
-     Default:1 Range: 0-1.7976931348623157e+308
-
-* **planner.enable\_broadcast_join**  
-
-     Changes the state of aggregation and join operators. The broadcast join can be used for hash join, merge join, and nested loop join. Use to join a large (fact) table to relatively smaller (dimension) tables.  
-
-     Default: true 
-
-* **planner.broadcast_threshold**  
-
-    Threshold, in terms of a number of rows, that determines whether a broadcast join is chosen for a query. Regardless of the setting of the broadcast_join option (enabled or disabled), a broadcast join is not chosen unless the right side of the join is estimated to contain fewer rows than this threshold. The intent of this option is to avoid broadcasting too many rows for join purposes. Broadcasting involves sending data across nodes and is a network-intensive operation. (The "right side" of the join, which may itself be a join or simply a table, is determined by cost-based optimizations and heuristics during physical planning.)  
-    
-    Default: 10000000 Range: 0-2147483647
+---
+title: "Join Planning Guidelines"
+date: 2016-11-21 22:28:42 UTC
+parent: "Query Plans and Tuning"
+--- 
+
+Drill uses distributed and broadcast joins to join tables. You can modify configuration settings in Drill to control how Drill plans joins in a query.
+
+## Distributed Joins
+For a distributed join, both sides of the join are hash distributed using one of the hash-based distribution operators on the join key. See Operators. 
+
+If there are multiple join keys from each table, Drill considers the two following types of plans:  
+1. A plan where data is distributed on all keys.  
+2. A plan where data is distributed on each individual key.  
+ 
+For a merge join, Drill sorts both sides of the join after performing the hash distribution. Drill can distribute both sides of a hash join or merge join, but cannot do so for a nested loop join. 
+
+## Broadcast Joins
+In a broadcast join, all of the selected records of one file are broadcast to the file on all other nodes before the join is performed. The inner side of the join is broadcast while the outer side is kept as-is without any re-distribution. The estimated cardinality of the inner child must be below the planner.broadcast_threshold parameter in order to be eligible for broadcast.  Drill can use broadcast joins for hash, merge, and nested loop joins.
+ 
+A broadcast join is useful when a large (fact) table is being joined to a relatively smaller (dimension) table. If the fact table is stored as many files in the distributed file system, instead of re-distributing the fact table over the network, it may be substantially cheaper to broadcast the inner side.  However, the broadcast sends the same data to all other nodes in the cluster.  Depending on the size of the cluster and the size of the data, it may not be the most efficient policy in some situations.
+ 
+### Broadcast Join Options
+You can increase the size and affinity for Drill to use broadcast joins with the ALTER SYSTEM or ALTER SESSION commands and options. Typically, you set the options at the session level unless you want the setting to persist across all sessions.
+
+The following configuration options in Drill control broadcast join behavior:  
+
+* **planner.broadcast_factor** 
+
+     Controls the cost of doing a broadcast when performing a join.  The lower the setting, the cheaper it is to do a broadcast join compared to other types of distribution for a join, such as a hash distribution.  
+
+     Default:1 Range: 0-1.7976931348623157e+308
+
+* **planner.enable\_broadcast_join**  
+
+     Changes the state of aggregation and join operators. The broadcast join can be used for hash join, merge join, and nested loop join. Use to join a large (fact) table to relatively smaller (dimension) tables.  
+
+     Default: true 
+
+* **planner.broadcast_threshold**  
+
+    Threshold, in terms of a number of rows, that determines whether a broadcast join is chosen for a query. Regardless of the setting of the broadcast_join option (enabled or disabled), a broadcast join is not chosen unless the right side of the join is estimated to contain fewer rows than this threshold. The intent of this option is to avoid broadcasting too many rows for join purposes. Broadcasting involves sending data across nodes and is a network-intensive operation. (The "right side" of the join, which may itself be a join or simply a table, is determined by cost-based optimizations and heuristics during physical planning.)  
+    
+    Default: 10000000 Range: 0-2147483647

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/query-plans-and-tuning/030-guidelines-for-optimizing-aggregation.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/030-guidelines-for-optimizing-aggregation.md b/_docs/performance-tuning/query-plans-and-tuning/030-guidelines-for-optimizing-aggregation.md
index 33d49b3..79fcd74 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/030-guidelines-for-optimizing-aggregation.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/030-guidelines-for-optimizing-aggregation.md
@@ -1,22 +1,22 @@
----
-title: "Guidelines for Optimizing Aggregation"
-date:  
-parent: "Query Plans and Tuning"
---- 
-
-
-For queries that contain GROUP BY, Drill performs aggregations in either 1 or 2 phases.  In both of these schemes, Drill can use the Hash Aggregate and Streaming Aggregate physical operators.  The default behavior in Drill is to perform 2 phase aggregation.  
- 
-In the 2 phase aggregation scheme, each minor fragment performs local (partial) aggregation in phase 1.  It then sends the partially aggregated results to other fragments using a hash-based distribution operator.  The hash distribution is done on the GROUP BY keys.  In phase 2 all of the fragments perform a total aggregation using data received from phase 1.  
- 
-The 2 phase aggregation scheme is very efficient when the data contains grouping keys with a reasonable number of duplicate values such that doing the grouping reduces the number of rows sent to downstream operators.  However, if there is not much reduction it is best to use 1 phase aggregation.   
- 
-For example, suppose the query does a GROUP BY x, y.  If the combination of {x, y} values is unique (or nearly unique) in all of the rows of the input data, then there is no reduction in the number of rows when performing the grouping.  In this case, performance improves by doing 1 phase aggregation.  
- 
-You can use the ALTER SYSTEM or ALTER SESSION commands with the following option to control aggregation in Drill:
-
-*  planner.enable\_multiphase\_agg 
-
- 
-The default for this option is `true`.Typically, you set the options at the session level unless you want the setting to persist across all sessions.
- 
+---
+title: "Guidelines for Optimizing Aggregation"
+date: 2016-11-21 22:28:43 UTC
+parent: "Query Plans and Tuning"
+--- 
+
+
+For queries that contain GROUP BY, Drill performs aggregations in either 1 or 2 phases.  In both of these schemes, Drill can use the Hash Aggregate and Streaming Aggregate physical operators.  The default behavior in Drill is to perform 2 phase aggregation.  
+ 
+In the 2 phase aggregation scheme, each minor fragment performs local (partial) aggregation in phase 1.  It then sends the partially aggregated results to other fragments using a hash-based distribution operator.  The hash distribution is done on the GROUP BY keys.  In phase 2 all of the fragments perform a total aggregation using data received from phase 1.  
+ 
+The 2 phase aggregation scheme is very efficient when the data contains grouping keys with a reasonable number of duplicate values such that doing the grouping reduces the number of rows sent to downstream operators.  However, if there is not much reduction it is best to use 1 phase aggregation.   
+ 
+For example, suppose the query does a GROUP BY x, y.  If the combination of {x, y} values is unique (or nearly unique) in all of the rows of the input data, then there is no reduction in the number of rows when performing the grouping.  In this case, performance improves by doing 1 phase aggregation.  
+ 
+You can use the ALTER SYSTEM or ALTER SESSION commands with the following option to control aggregation in Drill:
+
+*  planner.enable\_multiphase\_agg 
+
+ 
+The default for this option is `true`.Typically, you set the options at the session level unless you want the setting to persist across all sessions.
+ 

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/query-plans-and-tuning/040-modifying-query-planning-options.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/040-modifying-query-planning-options.md b/_docs/performance-tuning/query-plans-and-tuning/040-modifying-query-planning-options.md
index 03c2514..308c7cf 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/040-modifying-query-planning-options.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/040-modifying-query-planning-options.md
@@ -1,31 +1,31 @@
----
-title: "Modifying Query Planning Options"
-date:  
-parent: "Query Plans and Tuning"
---- 
-
-Planner options affect how Drill plans a query. You can use the ALTER SYSTEM|SESSION commands to modify certain planning options to optimize query plans and improve performance.  Typically, you modify options at the session level. See [ALTER SESSION]({{ site.baseurl }}/docs/alter-session/) for details on how to run the command.
- 
-The following planning options affect query planning and performance:
-
-* **planner.width.max\_per_node**  
-     Configure this option to achieve fine grained, absolute control over parallelization.
-
-     In this context width refers to fan out or distribution potential: the ability to run a query in parallel across the cores on a node and the nodes on a cluster. A physical plan consists of intermediate operations, known as query "fragments," that run concurrently, yielding opportunities for parallelism above and below each exchange operator in the plan. An exchange operator represents a breakpoint in the execution flow where processing can be distributed. For example, a single-process scan of a file may flow into an exchange operator, followed by a multi-process aggregation fragment.
- 
-     The maximum width per node defines the maximum degree of parallelism for any fragment of a query, but the setting applies at the level of a single node in the cluster. The default maximum degree of parallelism per node is calculated as follows, with the theoretical maximum automatically scaled back (and rounded down) so that only 70% of the actual available capacity is taken into account: number of active drillbits (typically one per node) * number of cores per node * 0.7
- 
-     For example, on a single-node test system with 2 cores and hyper-threading enabled: 1 * 4 * 0.7 = 3.
-     When you modify the default setting, you can supply any meaningful number. The system does not automatically scale down your setting.  
-
-* **planner.width\_max\_per_query**  
-     Default is 1000. The maximum number of threads than can run in parallel for a query across all nodes. Only change this setting when Drill over-parallelizes on very large clusters.
- 
-* **planner.slice_target**  
-     Default is 100000. The minimum number of estimated records to work with in a major fragment before applying additional parallelization.
- 
-* **planner.broadcast_threshold**  
-     Default is 10000000. The maximum number of records allowed to be broadcast as part of a join. After one million records, Drill reshuffles data rather than doing a broadcast to one side of the join. To improve performance you can increase this number, especially on 10GB Ethernet clusters.
- 
-
-
+---
+title: "Modifying Query Planning Options"
+date: 2016-11-21 22:28:44 UTC
+parent: "Query Plans and Tuning"
+--- 
+
+Planner options affect how Drill plans a query. You can use the ALTER SYSTEM|SESSION commands to modify certain planning options to optimize query plans and improve performance.  Typically, you modify options at the session level. See [ALTER SESSION]({{ site.baseurl }}/docs/alter-session/) for details on how to run the command.
+ 
+The following planning options affect query planning and performance:
+
+* **planner.width.max\_per_node**  
+     Configure this option to achieve fine grained, absolute control over parallelization.
+
+     In this context width refers to fan out or distribution potential: the ability to run a query in parallel across the cores on a node and the nodes on a cluster. A physical plan consists of intermediate operations, known as query "fragments," that run concurrently, yielding opportunities for parallelism above and below each exchange operator in the plan. An exchange operator represents a breakpoint in the execution flow where processing can be distributed. For example, a single-process scan of a file may flow into an exchange operator, followed by a multi-process aggregation fragment.
+ 
+     The maximum width per node defines the maximum degree of parallelism for any fragment of a query, but the setting applies at the level of a single node in the cluster. The default maximum degree of parallelism per node is calculated as follows, with the theoretical maximum automatically scaled back (and rounded down) so that only 70% of the actual available capacity is taken into account: number of active drillbits (typically one per node) * number of cores per node * 0.7
+ 
+     For example, on a single-node test system with 2 cores and hyper-threading enabled: 1 * 4 * 0.7 = 3.
+     When you modify the default setting, you can supply any meaningful number. The system does not automatically scale down your setting.  
+
+* **planner.width\_max\_per_query**  
+     Default is 1000. The maximum number of threads than can run in parallel for a query across all nodes. Only change this setting when Drill over-parallelizes on very large clusters.
+ 
+* **planner.slice_target**  
+     Default is 100000. The minimum number of estimated records to work with in a major fragment before applying additional parallelization.
+ 
+* **planner.broadcast_threshold**  
+     Default is 10000000. The maximum number of records allowed to be broadcast as part of a join. After one million records, Drill reshuffles data rather than doing a broadcast to one side of the join. To improve performance you can increase this number, especially on 10GB Ethernet clusters.
+ 
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/query-plans-and-tuning/060-enabling-query-queuing.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/060-enabling-query-queuing.md b/_docs/performance-tuning/query-plans-and-tuning/060-enabling-query-queuing.md
index 1f76223..be07e19 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/060-enabling-query-queuing.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/060-enabling-query-queuing.md
@@ -1,49 +1,49 @@
----
-title: "Enabling Query Queuing"
-date:  
-parent: "Query Plans and Tuning"
---- 
-
-Drill runs all queries concurrently by default. However, Drill performance increases when a small number of queries run concurrently. You can enable query queues to limit the maximum number of queries that run concurrently. Splitting large queries into multiple small queries and enabling query queuing improves query performance.
- 
-When you enable query queuing, you configure large and small queues. Drill determines which queue to route a query to at runtime based on the size of the query. Drill can quickly complete the queries and then continue on to the next set of queries.
-
-## Example Configuration  
-
-For example, you configure the queue reserved for large queries for a 5-query maximum. You configure the queue reserved for small queries for 20 queries. Users start to run queries, and Drill receives the following query requests in this order:  
-
-* Query A (blue): 1 billion records, Drill estimates 10 million rows will be processed
-* Query B (red): 2 billion records, Drill estimates 20 million rows will be processed
-* Query C: 1 billion records
-* Query D: 100 records
- 
-The exec.queue.threshold default is 30 million, which is the estimated rows to be processed by the query. Queries A and B are queued in the large queue. The estimated rows to be processed reaches the 30 million threshold, filling the queue to capacity. The query C request arrives and goes on the wait list, and then query D arrives. Query D is queued immediately in the small queue because of its small size, as shown in the following diagram:
-
-![]({{ site.baseurl }}/docs/img/query_queuing.png)  
-
-The Drill queuing configuration in this example tends to give many users running small queries a rapid response. Users running a large query might experience some delay until an earlier-received large query returns, freeing space in the large queue to process queries that are waiting.
-
-Use the ALTER SYSTEM or ALTER SESSION commands with the options below to enable query queuing and set the maximum number of queries that each queue allows. Typically, you set the options at the session level unless you want the setting to persist across all sessions.
-
-
-* **exec.queue.enable**  
-    Changes the state of query queues to control the number of queries that run simultaneously. When disabled, there is no limit on the number of concurrent queries.  
-    Default: false
-
-* **exec.queue.large**  
-    Sets the number of large queries that can run concurrently in the cluster.  
-    Range: 0-1000. Default: 10
-
-* **exec.queue.small**  
-    Sets the number of small queries that can run concurrently in the cluster. Range: 0-1001.  
-    Range: 0 - 1073741824 Default: 100
-
-* **exec.queue.threshold**  
-    Sets the cost threshold, which depends on the complexity of the queries in queue, for determining whether a query is large or small. Complex queries have higher thresholds.  
-    Range: 0-9223372036854775807 Default: 30000000
-
-* **exec.queue.timeout_millis**  
-    Indicates how long a query can wait in queue before the query fails.  
-    Range: 0-9223372036854775807 Default: 300000
-
-
+---
+title: "Enabling Query Queuing"
+date: 2016-11-21 22:28:44 UTC
+parent: "Query Plans and Tuning"
+--- 
+
+Drill runs all queries concurrently by default. However, Drill performance increases when a small number of queries run concurrently. You can enable query queues to limit the maximum number of queries that run concurrently. Splitting large queries into multiple small queries and enabling query queuing improves query performance.
+ 
+When you enable query queuing, you configure large and small queues. Drill determines which queue to route a query to at runtime based on the size of the query. Drill can quickly complete the queries and then continue on to the next set of queries.
+
+## Example Configuration  
+
+For example, you configure the queue reserved for large queries for a 5-query maximum. You configure the queue reserved for small queries for 20 queries. Users start to run queries, and Drill receives the following query requests in this order:  
+
+* Query A (blue): 1 billion records, Drill estimates 10 million rows will be processed
+* Query B (red): 2 billion records, Drill estimates 20 million rows will be processed
+* Query C: 1 billion records
+* Query D: 100 records
+ 
+The exec.queue.threshold default is 30 million, which is the estimated rows to be processed by the query. Queries A and B are queued in the large queue. The estimated rows to be processed reaches the 30 million threshold, filling the queue to capacity. The query C request arrives and goes on the wait list, and then query D arrives. Query D is queued immediately in the small queue because of its small size, as shown in the following diagram:
+
+![]({{ site.baseurl }}/docs/img/query_queuing.png)  
+
+The Drill queuing configuration in this example tends to give many users running small queries a rapid response. Users running a large query might experience some delay until an earlier-received large query returns, freeing space in the large queue to process queries that are waiting.
+
+Use the ALTER SYSTEM or ALTER SESSION commands with the options below to enable query queuing and set the maximum number of queries that each queue allows. Typically, you set the options at the session level unless you want the setting to persist across all sessions.
+
+
+* **exec.queue.enable**  
+    Changes the state of query queues to control the number of queries that run simultaneously. When disabled, there is no limit on the number of concurrent queries.  
+    Default: false
+
+* **exec.queue.large**  
+    Sets the number of large queries that can run concurrently in the cluster.  
+    Range: 0-1000. Default: 10
+
+* **exec.queue.small**  
+    Sets the number of small queries that can run concurrently in the cluster. Range: 0-1001.  
+    Range: 0 - 1073741824 Default: 100
+
+* **exec.queue.threshold**  
+    Sets the cost threshold, which depends on the complexity of the queries in queue, for determining whether a query is large or small. Complex queries have higher thresholds.  
+    Range: 0-9223372036854775807 Default: 30000000
+
+* **exec.queue.timeout_millis**  
+    Indicates how long a query can wait in queue before the query fails.  
+    Range: 0-9223372036854775807 Default: 300000
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/a0608abe/_docs/performance-tuning/query-plans-and-tuning/070-controlling-parallelization-to-balance-performance-with-multi-tenancy.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/070-controlling-parallelization-to-balance-performance-with-multi-tenancy.md b/_docs/performance-tuning/query-plans-and-tuning/070-controlling-parallelization-to-balance-performance-with-multi-tenancy.md
index fb752e5..3d042f1 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/070-controlling-parallelization-to-balance-performance-with-multi-tenancy.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/070-controlling-parallelization-to-balance-performance-with-multi-tenancy.md
@@ -1,11 +1,11 @@
----
-title: "Controlling Parallelization to Balance Performance with Multi-Tenancy"
-date:  
-parent: "Query Plans and Tuning"
---- 
-
-When you run Drill in a multi-tenant environment, (in conjunction with other workloads in a cluster, such as MapReduce) you may need to modify Drill settings and options to maximize performance, or reduce the allocated resources to other applications. See [Configuring Multi-Tenant Resources]({{ site.baseurl }}/docs/configuring-multitenant-resources/).
-Drill is memory intensive and therefore requires sufficient memory to run optimally. You can modify how much memory that you want allocated to Drill. Drill typically performs better with as much memory as possible. See [Configuring Drill Memory]({{ site.baseurl }}/docs/configuring-drill-memory/).
- 
-Reducing the level of parallelism in Drill can also help to balance the workloads and avoid resource conflicts. See [Configuring Parallelization]({{ site.baseurl }}/docs/configuring-resources-for-a-shared-drillbit/#configuring-parallelization).
-
+---
+title: "Controlling Parallelization to Balance Performance with Multi-Tenancy"
+date: 2016-11-21 22:28:45 UTC
+parent: "Query Plans and Tuning"
+--- 
+
+When you run Drill in a multi-tenant environment, (in conjunction with other workloads in a cluster, such as MapReduce) you may need to modify Drill settings and options to maximize performance, or reduce the allocated resources to other applications. See [Configuring Multi-Tenant Resources]({{ site.baseurl }}/docs/configuring-multitenant-resources/).
+Drill is memory intensive and therefore requires sufficient memory to run optimally. You can modify how much memory that you want allocated to Drill. Drill typically performs better with as much memory as possible. See [Configuring Drill Memory]({{ site.baseurl }}/docs/configuring-drill-memory/).
+ 
+Reducing the level of parallelism in Drill can also help to balance the workloads and avoid resource conflicts. See [Configuring Parallelization]({{ site.baseurl }}/docs/configuring-resources-for-a-shared-drillbit/#configuring-parallelization).
+


Mime
View raw message