drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: Updates/edits for Drill 1.5
Date Sat, 06 Feb 2016 00:19:20 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 744a86fa6 -> 7f224e550


Updates/edits for Drill 1.5


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/7f224e55
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/7f224e55
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/7f224e55

Branch: refs/heads/gh-pages
Commit: 7f224e5502770f964182bcb1bc71164c61ed4fd1
Parents: 744a86f
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Fri Feb 5 16:18:10 2016 -0800
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Fri Feb 5 16:18:10 2016 -0800

----------------------------------------------------------------------
 _docs/110-troubleshooting.md                    |  9 ++-
 .../020-configuring-drill-memory.md             |  8 +-
 .../010-configuration-options-introduction.md   |  4 +-
 ...d-hash-based-memory-constrained-operators.md | 81 ++++++++++----------
 4 files changed, 55 insertions(+), 47 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/7f224e55/_docs/110-troubleshooting.md
----------------------------------------------------------------------
diff --git a/_docs/110-troubleshooting.md b/_docs/110-troubleshooting.md
index 560892f..204a84e 100644
--- a/_docs/110-troubleshooting.md
+++ b/_docs/110-troubleshooting.md
@@ -1,6 +1,6 @@
 ---
 title: "Troubleshooting"
-date: 2016-01-05
+date: 2016-02-06 00:18:11 UTC
 ---
 
 You may experience certain known issues when using Drill. This document lists some known
issues and resolutions for each.
@@ -55,9 +55,12 @@ If you have any of the following problems, try the suggested solution:
 * [Error Starting Drill in Embedded Mode]({{site.baseurl}}/docs/troubleshooting/#error-starting-drill-in-embedded-mode)
 
 ### Memory Issues
-Symptom: Memory problems occur when you run certain queries, such as those that perform window
functions.
+Symptom: Memory problems occur when you run certain queries, such as those with sort operators.
+
+Solution: Increase the value of the [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
option, which sets the maximum amount of direct memory allocated to the sort operator in each
query on a node. If a query plan contains multiple sort operators, they all share this memory.

+If you continue to encounter memory issues after increasing the  `planner.memory.max_query_memory_per_node`
value, you can also reduce the value of the `planner.width.max_per_node` option to reduce
the level of parallelism per node. However, this may increase the amount of time required
for a query to complete.
+See [Configuring Drill Memory]({{site.baseurl}}/docs/configuring-drill-memory/).
 
-Solution: The [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
system option value determines the memory limits per node for each running query, especially
for those involving external sorts, such as window functions. When you have a large amount
of direct memory allocated, but still encounter memory issues when running these queries,
increase the value of the option.
 
 ### Query Parsing Errors
 Symptom:  

http://git-wip-us.apache.org/repos/asf/drill/blob/7f224e55/_docs/configure-drill/020-configuring-drill-memory.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/020-configuring-drill-memory.md b/_docs/configure-drill/020-configuring-drill-memory.md
index 7d99c6e..4dc1292 100644
--- a/_docs/configure-drill/020-configuring-drill-memory.md
+++ b/_docs/configure-drill/020-configuring-drill-memory.md
@@ -1,6 +1,6 @@
 ---
 title: "Configuring Drill Memory"
-date:  
+date: 2016-02-06 00:18:12 UTC
 parent: "Configure Drill"
 ---
 
@@ -15,7 +15,11 @@ The JVM’s heap memory does not limit the amount of direct memory available
in
 a Drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
 suffice because Drill avoids having data sit in heap memory.
 
-The [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
system option value determines the memory limits per node for each running query, especially
for those involving external sorts, such as window functions. When you have a large amount
of direct memory allocated, but still encounter memory issues when running these queries,
increase the value of the option.
+As of Drill 1.5, Drill uses a new allocator that improves an operator’s use of direct memory
and tracks the memory use more accurately. Due to this change, the sort operator (in queries
that ran successfully in previous releases) may not have enough memory, resulting in a failed
query and out of memory error instead of spilling to disk.
+
+
+The [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
system option value sets the maximum amount of direct memory allocated to the sort operator
in each query on a node. If a query plan contains multiple sort operators, they all share
this memory. If you encounter memory issues when running queries with sort operators, increase
the value of this option. If you continue to encounter memory issues after increasing this
value, you can also reduce the value of the [`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/)
option to reduce the level of parallelism per node. However, this may increase the amount
of time required for a query to complete.  
+
 
 ## Modifying Drillbit Memory
 

http://git-wip-us.apache.org/repos/asf/drill/blob/7f224e55/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
index 0647d70..df2fcb9 100644
--- a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
+++ b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
@@ -1,6 +1,6 @@
 ---
 title: "Configuration Options Introduction"
-date:  
+date: 2016-02-06 00:18:12 UTC
 parent: "Configuration Options"
 ---
 Drill provides many configuration options that you can enable, disable, or
@@ -61,7 +61,7 @@ The sys.options table lists the following options that you can set as a
system o
 | planner.memory.enable_memory_estimation        | FALSE            | Toggles the state of
memory estimation and re-planning of the query. When enabled, Drill conservatively estimates
memory requirements and typically excludes these operators from the plan and negatively impacts
performance.                                                                             
                                                     |
 | planner.memory.hash_agg_table_factor           | 1.1              | A heuristic value for
influencing the size of the hash aggregation table.                                      
                                                                                         
                                                                                         
                                                             |
 | planner.memory.hash_join_table_factor          | 1.1              | A heuristic value for
influencing the size of the hash aggregation table.                                      
                                                                                         
                                                                                         
                                                             |
-| planner.memory.max_query_memory_per_node       | 2147483648 bytes | Sets the maximum estimate
of memory for a query per node in bytes. If the estimate is too low, Drill re-plans the query
without memory-constrained operators.                                                    
                                                                                         
                                                     |
+| planner.memory.max_query_memory_per_node       | 2147483648 bytes | Sets the maximum amount
of direct memory allocated to the sort operator in each query on a node. If a query plan contains
multiple sort operators, they all share this memory. If you encounter memory issues when running
queries with sort operators, increase the value of this option.                          
                                                                                         
                                                                               |
 | planner.memory.non_blocking_operators_memory   | 64               | Extra query memory
per node for non-blocking operators. This option is currently used only for memory estimation.
Range: 0-2048 MB                                                                         
                                                                                         
                                                           |
 | planner.memory_limit                           | 268435456 bytes  | Defines the maximum
amount of direct memory allocated to a query for planning. When multiple queries run concurrently,
each query is allocated the amount of memory set by this parameter.Increase the value of this
parameter and rerun the query if partition pruning failed due to insufficient memory.    
                                                  |
 | planner.nestedloopjoin_factor                  | 100              | A heuristic value for
influencing the nested loop join.                                                        
                                                                                         
                                                                                         
                                                             |

http://git-wip-us.apache.org/repos/asf/drill/blob/7f224e55/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
index e6f45c9..9b7aabc 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
@@ -1,40 +1,41 @@
----
-title: "Sort-Based and Hash-Based Memory-Constrained Operators"
-date:  
-parent: "Query Plans and Tuning"
---- 
-
-Drill uses hash-based and sort-based operators depending on the query characteristics. Hash
aggregation and hash join are hash-based operations. Streaming aggregation and merge join
are sort-based operations. Both hash-based and sort-based operations consume memory, however
the hash aggregate and hash join operators are the fastest and most memory intensive operators.
- 
-Currently, hash-based operations do not spill to disk as needed, but the sort-based operations
do. When Drill plans a sort-based query, it evaluates the size of available memory multiplied
by a configurable reduction constant (for parallelization purposes) and then limits the sort-based
operations to the maximum of this amount of memory.
-
-If the hash-based operators run out of memory during execution, the query fails. If large
hash operations do not fit in memory on your system, you can disable these operations. When
disabled, Drill creates alternative plans that allow spilling to disk.
-
-You can also modify the minimum hash table size, increasing the size for very large aggregations
or joins when you have large amounts of memory for Drill to use. If you have large data sets,
you can increase this hash table size to improve performance.
- 
-Use the ALTER SYSTEM or ALTER SESSION commands with the options in the table below to disable
the hash aggregate and hash join operators, modify the hash table size, disable memory estimation,
or set the estimated maximum amount of memory for a query. Typically, you set the options
at the session level unless you want the setting to persist across all sessions.
-
-The following options control the hash-based operators:
-
-* **planner.enable_hashagg**  
-    Enable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write
to disk. Enable is recommended. Default: true
-
-* **planner.enable_hashjoin**  
-    Enable the memory hungry hash join. Drill assumes that a query will have adequate memory
to complete and tries to use the fastest operations possible to complete the planned inner,
left, right, or full outer joins using a hash table. Does not write to disk. Disabling hash
join allows Drill to manage arbitrarily large data in a small memory footprint. Default: true
-
-* **exec.min_hash_table_size**  
-    Starting size for hash tables. Increase according to available memory to improve performance.
 
-    Default: 65536 Range: 0 - 1073741824
-
-* **exec.max\_hash\_table_size**  
-    Ending size for hash tables.  
-    Default: 1073741824 Range: 0 - 1073741824
-
-* **planner.memory.enable\_memory_estimation**  
-    Toggles the state of memory estimation and re-planning of the query. When enabled, Drill
conservatively estimates memory requirements and typically excludes memory-constrained operators
from the plan and negatively impacts performance.  
-    Default: false
-
-
-* **planner.memory.max\_query\_memory\_per_node**  
-    Sets the maximum estimate of memory for a query per node. If the estimate is too low,
Drill re-plans the query without memory-constrained operators.  
-    Default: 2147483648
+---
+title: "Sort-Based and Hash-Based Memory-Constrained Operators"
+date: 2016-02-06 00:18:13 UTC
+parent: "Query Plans and Tuning"
+--- 
+
+Drill uses hash-based and sort-based operators depending on the query characteristics. Hash
aggregation and hash join are hash-based operations. Streaming aggregation and merge join
are sort-based operations. Both hash-based and sort-based operations consume memory, however
the hash aggregate and hash join operators are the fastest and most memory intensive operators.
+ 
+Currently, hash-based operations do not spill to disk as needed, but the sort-based operations
do. When Drill plans a sort-based query, it evaluates the size of available memory multiplied
by a configurable reduction constant (for parallelization purposes) and then limits the sort-based
operations to the maximum of this amount of memory.
+
+If the hash-based operators run out of memory during execution, the query fails. If large
hash operations do not fit in memory on your system, you can disable these operations. When
disabled, Drill creates alternative plans that allow spilling to disk.
+
+You can also modify the minimum hash table size, increasing the size for very large aggregations
or joins when you have large amounts of memory for Drill to use. If you have large data sets,
you can increase this hash table size to improve performance.
+ 
+Use the ALTER SYSTEM or ALTER SESSION commands with the options in the table below to disable
the hash aggregate and hash join operators, modify the hash table size, disable memory estimation,
or set the estimated maximum amount of memory for a query. Typically, you set the options
at the session level unless you want the setting to persist across all sessions.
+
+The following options control the hash-based operators:
+
+* **planner.enable_hashagg**  
+    Enable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write
to disk. Enable is recommended. Default: true
+
+* **planner.enable_hashjoin**  
+    Enable the memory hungry hash join. Drill assumes that a query will have adequate memory
to complete and tries to use the fastest operations possible to complete the planned inner,
left, right, or full outer joins using a hash table. Does not write to disk. Disabling hash
join allows Drill to manage arbitrarily large data in a small memory footprint. Default: true
+
+* **exec.min_hash_table_size**  
+    Starting size for hash tables. Increase according to available memory to improve performance.
 
+    Default: 65536 Range: 0 - 1073741824
+
+* **exec.max\_hash\_table_size**  
+    Ending size for hash tables.  
+    Default: 1073741824 Range: 0 - 1073741824
+
+* **planner.memory.enable\_memory_estimation**  
+    Toggles the state of memory estimation and re-planning of the query. When enabled, Drill
conservatively estimates memory requirements and typically excludes memory-constrained operators
from the plan and negatively impacts performance.  
+    Default: false
+
+
+* **planner.memory.max\_query\_memory\_per_node**  
+    Sets the maximum amount of direct memory allocated to the sort operator in each query
on a node. If a query plan contains multiple sort operators, they all share this memory. If
you encounter memory issues when running queries with sort operators, increase the value of
this option.  
+    Default: 2147483648 
+


Mime
View raw message