drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: add distribution operator descriptions
Date Fri, 03 Jun 2016 22:17:26 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages b3b409c8b -> ea1aa1fa7


add distribution operator descriptions


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/ea1aa1fa
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/ea1aa1fa
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/ea1aa1fa

Branch: refs/heads/gh-pages
Commit: ea1aa1fa79b512ab88263bb06c582c057e4e21c5
Parents: b3b409c
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Fri Jun 3 15:11:50 2016 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Fri Jun 3 15:11:50 2016 -0700

----------------------------------------------------------------------
 .../020-physical-operators.md                   | 234 ++++++++++---------
 1 file changed, 118 insertions(+), 116 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/ea1aa1fa/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
b/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
index bbcf1c8..d019b82 100644
--- a/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
+++ b/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
@@ -1,116 +1,118 @@
----
-title: "Physical Operators"
-date:  
-parent: "Performance Tuning Reference"
---- 
-
-This document describes the physical operators that Drill uses in query plans.
-
-## Distribution Operators  
-
-Drill uses the following operators to perform data distribution over the network:  
-
-* HashToRandomExchange
-* HashToMergeExchange
-* UnionExchange
-* SingleMergeExchange
-* BroadcastExchange
-* UnorderedMuxExchange
-
-## Join Operators  
-
-Drill uses the following operators:
-
-| Operator         | Description                                                        
                                                                                         
                                                                                         
                                                                                         
                                                                                         
               |
-|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Hash Join        | A Hash Join is used for inner joins, left, right and full outer joins.
 A hash table is built on the rows produced by the inner child of the Hash Join.  The outer
child rows are used to probe the hash table and find matches. This operator Holds the entire
dataset for the right hand side of the join in memory  which could be up to 2 billion records
per minor fragment.                                                                      
   |
-| Merge Join       | A Merge Join is used for inner join, left and right outer joins.  Inputs
to the Merge Join must be sorted. It reads the sorted input streams from both sides and finds
matching rows.  This operator holds the amount of memory of one incoming record batch from
each side of the join.   In addition, if there are repeating values in the right hand side
of the join, the Merge Join will hold record batches for as long as a repeated value extends.
|
-| Nested Loop Join | A Nested Loop Join is used for certain types of cartesian joins and
inequality joins.                                                                        
                                                                                         
                                                                                         
                                                                                         
               |  
-
-## Aggregate Operators  
-
-Drill uses the following aggregate operators:  
-
-| Operator            | Description                                                     
                                                                                         
                                                                                         
                                                                                         
                                                                                       |
-|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Hash Aggregate      | A Hash Aggregate performs grouped aggregation on the input data by
building a hash table on the GROUP-BY keys and computing the aggregate values within each
group. This operator holds memory for each aggregation grouping and each aggregate value,
up to 2 billion values per minor fragment.                                               
                                                                                     |
-| Streaming Aggregate | A Streaming Aggregate performs grouped aggregation and non-grouped
aggregation.  For grouped aggregation, the data must be sorted on the GROUP-BY keys.  Aggregate
values are computed within each group.  For non-grouped aggregation, data does not have to
be sorted. This operator maintains a single aggregate grouping (keys and aggregate intermediate
values) at a time in addition to the size of one incoming record batch. |  
-
-## Sort and Limit Operators  
-
-Drill uses the following sort and limiter operators:  
-
-| Operator     | Description                                                            
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
-|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Sort         | A Sort operator is used to perform an ORDER BY and as an upstream operator
for other  operations that require sorted data such as Merge Join, Streaming Aggregate.  
                                                                                         
                                                                                         
                                                                                         
                        |
-| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.
 This operator will also start spooling to the disk in the case that there is memory pressure.
 In this case, the external sort will continue to try to use as much memory as available.
 In all cases, external sort will hold at least one record batch in memory for each record
spill.  Spills are currently sized based on the amount of memory available to the external
sort operator. |
-| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.             
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
-| Limit        | A Limit operator is used to restrict the number of rows to a value specified
by the LIMIT clause.                                                                     
                                                                                         
                                                                                         
                                                                                         
                      |  
-
-## Projection Operators  
-
-Drill uses the following projection operators: 
-
-| Operator     | Description                                                            
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
-|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Project      | A Project operator projects columns and/or expressions involving columns
and constants. This operator holds one incoming record batch plus any additional materialized
projects for the same number of rows as the incoming record batch.                       
                                                                                         
                                                                                         
                      |
-| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.
 This operator will also start spooling to the disk in the case that there is memory pressure.
 In this case, the external sort will continue to try to use as much memory as available.
 In all cases, external sort will hold at least one record batch in memory for each record
spill.  Spills are currently sized based on the amount of memory available to the external
sort operator. |
-| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.             
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
-| Limit        | A Limit operator is used to restrict the number of rows to a value specified
by the LIMIT clause.                                                                     
                                                                                         
                                                                                         
                                                                                         
                      |  
-
-## Filter and Related Operators  
-
-Drill uses the following filter and related operators:  
-
-| Operator               | Description                                                  
                                                                                         
                                                                                         
                                                                                         
                                                     |
-|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Filter                 | A Filter operator is used to evaluate the WHERE clause and HAVING
clause predicates.  These predicates may consist of join predicates as well as single table
predicates.  The join predicates are evaluated by a join operator and the remaining predicates
are evaluated by the Filter operator. The amount of memory it consumes is slightly more than
the size of one incoming record batch. |
-| SelectionVectorRemover | A SelectionVectorRemover is used in conjunction with either a
Sort or Filter operator.  This operator maintains roughly twice the amount of memory as required
by a single incoming record batch.                                                       
                                                                                         
                                              |  
-
-## Set Operators  
-
-Drill uses the following set operators:  
-
-| Operator  | Description                                                               
                                                                                         
                                                                                         
                                                 |
-|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Union-All | A Union-All operator accepts rows from 2 input streams and produces a single
output stream where the left input rows are emitted first followed by the right input rows.
The column names of the output stream are inherited from the left input.  The column types
of the two child inputs must be compatible. |  
-
-## Scan Operators  
-
-Drill uses the following scan operators:    
-
-| Operator | Description                                                                
                                                                                         
                      |
-|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Scan     | Performs a scan of the underlying table.  The table may be in one of several
formats, such as Parquet, Text, JSON, and so on. The Scan operator encapsulates the formats
into one operator. |  
-
-## Receiver Operators 
-
-Drill uses the following receiver operators: 
-
-| Operator          | Description                                                       
                                                                                         
       |
-|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| UnorderedReceiver | The unordered receiver operator can hold up to 5 incoming record batches.
                                                                                         
|
-| MergingReceiver   | This operator holds up to 5 record batches for each incoming stream
(generally either number of nodes or number of sending fragments, depending on use of muxxing).
|  
-
-## Sender Operators  
-
-Drill uses the following sender operators:  
-
-| Operator        | Description                                                         
                                                                                         
                                                                                         
                      |
-|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| PartitionSender | The PartitionSender operator maintains a queue for each outbound destination.
 May be either the number of outbound minor fragments or the number of the nodes, depending
on the use of muxxing operations.  Each queue may store up to 3 record batches for each destination.
|
-
-## File Writers  
-
-Drill uses the following file writers:  
-
-| Operator          | Description                                                       
                                                                            |
-|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
-| ParquetFileWriter | The ParquetFileWriter buffers approximately twice the default Parquet
row group size in memory per minor fragment (default in Drill is 512mb). |
-
-
-
-
- 
-
-
+---
+title: "Physical Operators"
+date: 2016-06-03 22:11:51 UTC
+parent: "Performance Tuning Reference"
+--- 
+
+This document describes the physical operators that Drill uses in query plans.
+
+## Distribution Operators  
+
+Drill uses the following operators to perform data distribution over the network:  
+
+| Operator             | Description                                                    
                                                                                         
                                                                                         
                                                                                         
            |
+|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| HashToRandomExchange | A HashToRandomExchange gets an   input row, computes a hash value
on the distribution key, determines the   destination receiver based on the hash value, and
sends the row in a batch   operation. The join key or aggregation group-by keys are examples
of distribution   keys. The destination receiver is a minor fragment on a destination   node.
 |
+| HashToMergeExchange  | A HashToMergeExchange is similar   to the HashToRandomExchange operator,
except that each destination receiver   mergers incoming streams of sorted data received from
a sender.                                                                                
                                                                                         |
+| UnionExchange        | A UnionExchange is a   serialization operator in which each sender
sends to a single (common)   destination. The receiver “unions” the input streams from
various senders.                                                                         
                                                                                         
            |
+| SingleMergeExchange  | A SingleMergeExchange is   distribution operator in which each sender
sends a sorted stream of data to a   single receiver. The receiver performs a Merge operation
to merge all of the   incoming streams. This operator is useful when performing an ORDER BY
operation   that requires a final global ordering.                                       
|
+| BroadcastExchange    | A BroadcastExchange is a   distrubtion operation in which each sender
sends its input data to all N   receivers via a broadcast.                               
                                                                                         
                                                                                         
      |
+| UnorderedMuxExchange | An UnorderedMuxExchange is an   operation that multiplexes the data
from all minor fragments on a node so the   data can be sent out on a single channel to a
destination receiver. A sender   node only needs to maintain buffers for each receiving node
instead of each   receiving minor fragment on every node.                                
   |
+
+## Join Operators  
+
+Drill uses the following join operators:
+
+| Operator         | Description                                                        
                                                                                         
                                                                                         
                                                                                         
                                                                                         
               |
+|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Hash Join        | A Hash Join is used for inner joins, left, right and full outer joins.
 A hash table is built on the rows produced by the inner child of the Hash Join.  The outer
child rows are used to probe the hash table and find matches. This operator Holds the entire
dataset for the right hand side of the join in memory  which could be up to 2 billion records
per minor fragment.                                                                      
   |
+| Merge Join       | A Merge Join is used for inner join, left and right outer joins.  Inputs
to the Merge Join must be sorted. It reads the sorted input streams from both sides and finds
matching rows.  This operator holds the amount of memory of one incoming record batch from
each side of the join.   In addition, if there are repeating values in the right hand side
of the join, the Merge Join will hold record batches for as long as a repeated value extends.
|
+| Nested Loop Join | A Nested Loop Join is used for certain types of cartesian joins and
inequality joins.                                                                        
                                                                                         
                                                                                         
                                                                                         
               |  
+
+## Aggregate Operators  
+
+Drill uses the following aggregate operators:  
+
+| Operator            | Description                                                     
                                                                                         
                                                                                         
                                                                                         
                                                                                       |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Hash Aggregate      | A Hash Aggregate performs grouped aggregation on the input data by
building a hash table on the GROUP-BY keys and computing the aggregate values within each
group. This operator holds memory for each aggregation grouping and each aggregate value,
up to 2 billion values per minor fragment.                                               
                                                                                     |
+| Streaming Aggregate | A Streaming Aggregate performs grouped aggregation and non-grouped
aggregation.  For grouped aggregation, the data must be sorted on the GROUP-BY keys.  Aggregate
values are computed within each group.  For non-grouped aggregation, data does not have to
be sorted. This operator maintains a single aggregate grouping (keys and aggregate intermediate
values) at a time in addition to the size of one incoming record batch. |  
+
+## Sort and Limit Operators  
+
+Drill uses the following sort and limiter operators:  
+
+| Operator     | Description                                                            
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
+|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Sort         | A Sort operator is used to perform an ORDER BY and as an upstream operator
for other  operations that require sorted data such as Merge Join, Streaming Aggregate.  
                                                                                         
                                                                                         
                                                                                         
                        |
+| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.
 This operator will also start spooling to the disk in the case that there is memory pressure.
 In this case, the external sort will continue to try to use as much memory as available.
 In all cases, external sort will hold at least one record batch in memory for each record
spill.  Spills are currently sized based on the amount of memory available to the external
sort operator. |
+| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.             
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
+| Limit        | A Limit operator is used to restrict the number of rows to a value specified
by the LIMIT clause.                                                                     
                                                                                         
                                                                                         
                                                                                         
                      |  
+
+## Projection Operators  
+
+Drill uses the following projection operators: 
+
+| Operator     | Description                                                            
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
+|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Project      | A Project operator projects columns and/or expressions involving columns
and constants. This operator holds one incoming record batch plus any additional materialized
projects for the same number of rows as the incoming record batch.                       
                                                                                         
                                                                                         
                      |
+| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.
 This operator will also start spooling to the disk in the case that there is memory pressure.
 In this case, the external sort will continue to try to use as much memory as available.
 In all cases, external sort will hold at least one record batch in memory for each record
spill.  Spills are currently sized based on the amount of memory available to the external
sort operator. |
+| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.             
                                                                                         
                                                                                         
                                                                                         
                                                                                         
                           |
+| Limit        | A Limit operator is used to restrict the number of rows to a value specified
by the LIMIT clause.                                                                     
                                                                                         
                                                                                         
                                                                                         
                      |  
+
+## Filter and Related Operators  
+
+Drill uses the following filter and related operators:  
+
+| Operator               | Description                                                  
                                                                                         
                                                                                         
                                                                                         
                                                     |
+|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Filter                 | A Filter operator is used to evaluate the WHERE clause and HAVING
clause predicates.  These predicates may consist of join predicates as well as single table
predicates.  The join predicates are evaluated by a join operator and the remaining predicates
are evaluated by the Filter operator. The amount of memory it consumes is slightly more than
the size of one incoming record batch. |
+| SelectionVectorRemover | A SelectionVectorRemover is used in conjunction with either a
Sort or Filter operator.  This operator maintains roughly twice the amount of memory as required
by a single incoming record batch.                                                       
                                                                                         
                                              |  
+
+## Set Operators  
+
+Drill uses the following set operators:  
+
+| Operator  | Description                                                               
                                                                                         
                                                                                         
                                                 |
+|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Union-All | A Union-All operator accepts rows from 2 input streams and produces a single
output stream where the left input rows are emitted first followed by the right input rows.
The column names of the output stream are inherited from the left input.  The column types
of the two child inputs must be compatible. |  
+
+## Scan Operators  
+
+Drill uses the following scan operators:    
+
+| Operator | Description                                                                
                                                                                         
                      |
+|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Scan     | Performs a scan of the underlying table.  The table may be in one of several
formats, such as Parquet, Text, JSON, and so on. The Scan operator encapsulates the formats
into one operator. |  
+
+## Receiver Operators 
+
+Drill uses the following receiver operators: 
+
+| Operator          | Description                                                       
                                                                                         
       |
+|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| UnorderedReceiver | The unordered receiver operator can hold up to 5 incoming record batches.
                                                                                         
|
+| MergingReceiver   | This operator holds up to 5 record batches for each incoming stream
(generally either number of nodes or number of sending fragments, depending on use of muxxing).
|  
+
+## Sender Operators  
+
+Drill uses the following sender operators:  
+
+| Operator        | Description                                                         
                                                                                         
                                                                                         
                      |
+|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| PartitionSender | The PartitionSender operator maintains a queue for each outbound destination.
 May be either the number of outbound minor fragments or the number of the nodes, depending
on the use of muxxing operations.  Each queue may store up to 3 record batches for each destination.
|
+
+## File Writers  
+
+Drill uses the following file writers:  
+
+| Operator          | Description                                                       
                                                                            |
+|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
+| ParquetFileWriter | The ParquetFileWriter buffers approximately twice the default Parquet
row group size in memory per minor fragment (default in Drill is 512mb). |
+
+
+
+
+ 
+
+


Mime
View raw message