Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A7A218589 for ; Wed, 6 May 2015 22:46:26 +0000 (UTC) Received: (qmail 62223 invoked by uid 500); 6 May 2015 22:46:26 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 62154 invoked by uid 500); 6 May 2015 22:46:26 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 62020 invoked by uid 99); 6 May 2015 22:46:26 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 22:46:26 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 1F47EE07EE; Wed, 6 May 2015 22:46:26 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: bridgetb@apache.org To: commits@drill.apache.org Date: Wed, 06 May 2015 22:46:32 -0000 Message-Id: <8919c52773344e55926b0925f15f1a12@git.apache.org> In-Reply-To: <800c82a5520e4f94b347f84d5321bb34@git.apache.org> References: <800c82a5520e4f94b347f84d5321bb34@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [08/11] drill git commit: reorg http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/070-configuring-user-impersonation.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/070-configuring-user-impersonation.md b/_docs/manage-drill/070-configuring-user-impersonation.md deleted file mode 100644 index 7f22d9d..0000000 --- a/_docs/manage-drill/070-configuring-user-impersonation.md +++ /dev/null @@ -1,150 +0,0 @@ ---- -title: "Configuring User Impersonation" -parent: "Manage Drill" ---- -Impersonation allows a service to act on behalf of a client while performing the action requested by the client. By default, user impersonation is disabled in Drill. You can configure user impersonation in the drill-override.conf file. - -When you enable impersonation, Drill executes client requests as the user logged in to the client. Drill passes the user credentials to the file system, and the file system checks to see if the user has permission to access the data. When you enable authentication, Drill uses the pluggable authentication module (PAM) to authenticate a user’s identity before the user can access the Drillbit process. See User Authentication. - -If impersonation is not configured, Drill executes all of the client requests against the file system as the user that started the Drillbit service on the node. This is typically a privileged user. The file system verifies that the system user has permission to access the data. - - -## Example -When impersonation is disabled and user Bob issues a query through the SQLLine client, SQLLine passes the query to the connecting Drillbit. The Drillbit executes the query as the system user that started the Drill process on the node. For the purpose of this example, we will assume that the system user has full access to the file system. Drill executes the query and returns the results back to the client. -![](http://i.imgur.com/4XxQK2I.png) - -When impersonation is enabled and user Bob issues a query through the SQLLine client, the Drillbit executes the query against the file system as Bob. The file system checks to see if Bob has permission to access the data. If so, Drill returns the query results to the client. If Bob does not have permission, Drill returns an error. -![](http://i.imgur.com/oigWqVg.png) - -## Impersonation Support -The following table lists the clients, storage plugins, and types of queries that you can use with impersonation in Drill: - - - - - - - - - - - - - - - - - - - - - - -
TypeSupportedNot Supported
ClientsSQLLine ODBC JDBCDrill Web UI REST API
Storage PluginsFile SystemHive HBase
QueriesWhen you enable impersonation, the setting applies to queries on data and metadata. For example, if you issue the SHOW SCHEMAS command, Drill impersonates the user logged into the client to access the requested metadata. If you issue a SELECT query on a workspace, Drill impersonates the user logged in to the client to access the requested data. Drill applies impersonation to queries issued using the following commands:
SHOW SCHEMAS
SHOW DATABASES
SHOW TABLES
CTAS
SELECT
CREATE VIEW
DROP VIEW
SHOW FILES
To successfully run the CTAS and CREATE VIEW commands, a user must have write permissions on the directory where the table or view will exist. Running these commands creates artifacts on the file system.
- -## Impersonation and Views -You can use views with impersonation to provide granular access to data and protect sensitive information. When you create a view, Drill stores the view definition in a file and suffixes the file with .drill.view. For example, if you create a view named myview, Drill creates a view file named myview.drill.view and saves it in the current workspace or the workspace specified, such as dfs.views.myview. See [CREATE VIEW]({{site.baseurl}}/docs/create-view-command/) Command. - -You can create a view and grant read permissions on the view to give other users access to the data that the view references. When a user queries the view, Drill impersonates the view owner to access the underlying data. A user with read access to a view can create new views from the originating view to further restrict access on data. - -### View Permissions -A user must have write permission on a directory or workspace to create a view, as well as read access on the table(s) and/or view(s) that the view references. When a user creates a view, permission on the view is set to owner by default. Users can query an existing view or create new views from the view if they have read permissions on the view file and the directory or workspace where the view file is stored. - -When users query a view, Drill accesses the underlying data as the user that created the view. If a user does not have permission to access a view, the query fails and Drill returns an error. Only the view owner or a superuser can modify view permissions to change them from owner to group or world. - -The view owner or a superuser can modify permissions on the view file directly or they can set view permissions at the system or session level prior to creating any views. Any user that alters view permissions must have write access on the directory or workspace in which they are working. See Modifying Permissions on a View File and Modifying SYSTEM|SESSION Level View Permissions. - -#### Modifying Permissions on a View File -Only a view owner or a super user can modify permissions on a view file to change them from owner to group or world readable. Before you grant permission to users to access a view, verify that they have access to the directory or workspace in which the view file is stored. - -Use the `chmod` and `chown` commands with the appropriate octal code to change permissions on a view file: - - - hadoop fs –chmod - hadoop fs –chown : -Example: `hadoop fs –chmod 750 employees.drill.view` - -####Modifying SYSTEM|SESSION Level View Permissions -Use the `ALTER SESSION|SYSTEM` command with the `new_view_default_permissions` parameter and the appropriate octal code to set view permissions at the system or session level prior to creating a view. - - ALTER SESSION SET `new_view_default_permissions` = ''; - ALTER SYSTEM SET `new_view_default_permissions` = ''; - -Example: ``ALTER SESSION SET `new_view_default_permissions` = '777';`` - -After you set this parameter, Drill applies the same permissions on each view created during the session or across all sessions if set at the system level. - -## Chained Impersonation -You can configure Drill to allow chained impersonation on views when you enable impersonation in the `drill-override.conf` file. Chained impersonation controls the number of identity transitions that Drill can make when a user queries a view. Each identity transition is equal to one hop. - -You can set the maximum number of hops on views to limit the number of times that Drill can impersonate a different user when a user queries a view. The default maximum number of hops is set at 3. When the maximum number of hops is set to 0, Drill does not allow impersonation chaining, and a user can only read data for which they have direct permission to access. You may set chain length to 0 to protect highly sensitive data. - -The following example depicts a scenario where the maximum hop number is set to 3, and Drill must impersonate three users to access data when Chad queries a view that Jane created: - -![](http://i.imgur.com/wwpStcs.png) - -In the previous example, Joe created a view V3 from views that user Frank created. In the following example, Joe created view V3 by joining a view that Frank created with a view that Bob created, thus increasing the number of identity transitions that Drill makes from 3 to 4, which exceeds the maximum hop setting of 3. - -In this scenario, when Chad queries Jane’s view Drill returns an error stating that the query cannot complete because the number of hops required to access the data exceeds the maximum hop setting of 3 that is configured. - -![](http://i.imgur.com/xO2yIDN.png) - -If users encounter this error, you can increase the maximum hop setting to accommodate users running queries on views. When configuring the maximum number of hops that Drill can make, consider that joined views increase the number of identity transitions required for Drill to access the underlying data. - -#### Configuring Impersonation and Chaining -Chaining is a system-wide setting that applies to all views. Currently, Drill does not provide an option to allow different chain lengths for different views. - -Complete the following steps on each Drillbit node to enable user impersonation, and set the maximum number of chained user hops that Drill allows: - -1. Navigate to `/conf/` and edit `drill-override.conf`. -2. Under `drill.exe`, add the following: - - drill.exec.impersonation: { - enabled: true, - max_chained_user_hops: 3 - } - -3. Verify that enabled is set to `‘true’`. -4. Set the maximum number of chained user hops that you want Drill to allow. -5. (MapR cluster only) Add one of the following lines to the `drill-env.sh` file: - * If the underlying file system is not secure, add the following line: - ` export MAPR_IMPERSONATION_ENABLED=true` - * If the underlying file system has MapR security enabled, add the following line: - `export MAPR_TICKETFILE_LOCATION=/opt/mapr/conf/mapruserticket` -6. Restart the Drillbit process on each Drill node. - * In a MapR cluster, run the following command: - `maprcli node services -name drill-bits -action restart -nodes -f` - * In a non-MapR environment, run the following command: - /bin/drillbit.sh restart - - -## Impersonation and Chaining Example -Frank is a senior HR manager at a company. Frank has access to all of the employee data because he is a member of the hr group. Frank created a table named “employees” in his home directory to store the employee data he uses. Only Frank has access to this table. - -drwx------ frank:hr /user/frank/employees - -Each record in the employees table consists of the following information: -emp_id, emp_name, emp_ssn, emp_salary, emp_addr, emp_phone, emp_mgr - -Frank needs to share a subset of this information with Joe who is an HR manager reporting to Frank. To share the employee data, Frank creates a view called emp_mgr_view that accesses a subset of the data. The emp_mgr_view filters out sensitive employee information, such as the employee social security numbers, and only shows data for the employees that report directly to Joe or the manager running the query on the view. Frank and Joe both belong to the mgr group. Managers have read permission on Frank’s directory. - -rwxr----- frank:mgr /user/frank/emp_mgr_view.drill.view - -The emp_mgr_view.drill.view file contains the following view definition: -(view definition: SELECT emp_id, emp_name, emp_salary, emp_addr, emp_phone FROM \`/user/frank/employee\` WHERE emp_mgr = user()) - -When Joe issues SELECT * FROM emp_mgr_view, Drill impersonates Frank when accessing the employee data, and the query returns the data that Joe has permission to see based on the view definition. The query results do not include any sensitive data because the view protects that information. If Joe tries to query the employees table directly, Drill returns an error or null values. - -Because Joe has read permissions on the emp_mgr_view, he can create new views from it to give other users access to the employee data even though he does not own the employees table and cannot access the employees table directly. - -Joe needs to share employee contact data with his direct reports, so he creates a special view called emp_team_view to share the employee contact information with his team. Joe creates the view and writes it to his home directory. Joe and his reports belong to a group named joeteam. The joeteam group has read permissions on Joe’s home directory so they can query the view and create new views from it. - -rwxr----- joe:joeteam /user/joe/emp_team_view.drill.view - -The emp_team_view.drill.view file contains the following view definition: - -(view definition: SELECT emp_id, emp_name, emp_phone FROM `/user/frank/emp_mgr_view.drill`); - -When anyone on Joe’s team issues SELECT * FROM emp_team_view, Drill impersonates Joe to access the emp_team_view and then impersonates Frank to access the emp_mgr_view and the employee data. Drill returns the data that Joe’s team has can see based on the view definition. If anyone on Joe’s team tries to query the emp_mgr_view or employees table directly, Drill returns an error or null values. - -Because Joe’s team has read permissions on the emp_team_view, they can create new views from it and write the views to any directory for which they have write access. Creating views can continue until Drill reaches the maximum number of impersonation hops. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/080-configuration-options.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/080-configuration-options.md b/_docs/manage-drill/080-configuration-options.md deleted file mode 100644 index 41275eb..0000000 --- a/_docs/manage-drill/080-configuration-options.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -title: "Configuration Options" -parent: "Manage Drill" ---- - - - - - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/090-start-stop.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/090-start-stop.md b/_docs/manage-drill/090-start-stop.md deleted file mode 100644 index 591b6ab..0000000 --- a/_docs/manage-drill/090-start-stop.md +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: "Starting/Stopping Drill" -parent: "Manage Drill" ---- -How you start Drill depends on the installation method you followed. If you installed Drill in embedded mode, invoking SQLLine automatically starts a Drillbit locally. If you installed Drill in distributed mode, and the Drillbit on a node did not start, start the Drillbit before attempting to run queries. How to start Drill is covered in detail the section, ["Install Drill"]({{site.baseurl}}/docs/install-drill/). - -## Examples of Starting Drill -Issue the **sqlline** command from the Drill installation directory. The simplest example of how to start SQLLine is to identify the protocol, JDBC, and zookeeper node or nodes in the **sqlline** command. This example starts SQLLine on a node in an embedded, single-node cluster: - - sqlline -u jdbc:drill:zk=local - -This example also starts SQLLine using the `dfs` storage plugin. Specifying the storage plugin when you start up eliminates the need to specify the storage plugin in the query: - - - bin/sqlline –u jdbc:drill:schema=dfs;zk=centos26 - -This command starts SQLLine in distributed, (multi-node) mode in a cluster configured to run zookeeper on three nodes: - - bin/sqlline –u jdbc:drill:zk=cento23,zk=centos24,zk=centos26:5181 - -## Exiting SQLLine - -To exit SQLLine, issue the following command: - - !quit - -## Stopping Drill - -In some cases, such as stopping while a query is in progress, the `!quit` command does not stop Drill running in embedded mode. In distributed mode, you stop the Drillbit service instead of killing the Drillbit process. - -To stop the Drill process on Mac OS X and Linux, use the kill command. On Windows, use the **TaskKill** command. - -For example, on Mac OS X and Linux, follow these steps: - - 1. Issue a CTRL Z to stop the query, then start Drill again. If the startup message indicates success, skip the rest of the steps. If not, proceed to step 2. - 2. Search for the Drill process IDs. - - $ ps auwx | grep drill - 3. Kill each process using the process numbers in the grep output. For example: - - $ sudo kill -9 2674 - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/100-ports-used-by-drill.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/100-ports-used-by-drill.md b/_docs/manage-drill/100-ports-used-by-drill.md deleted file mode 100644 index 42ecd20..0000000 --- a/_docs/manage-drill/100-ports-used-by-drill.md +++ /dev/null @@ -1,15 +0,0 @@ ---- -title: "Ports Used by Drill" -parent: "Manage Drill" ---- -The following table provides a list of the ports that Drill uses, the port -type, and a description of how Drill uses the port: - -| Port | Type | Description | -|-------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| 8047 | TCP | Needed for the Drill Web UI. | -| 31010 | TCP | User port address. Used between nodes in a Drill cluster. Needed for an external client, such as Tableau, to connect into thecluster nodes. Also needed for the Drill Web UI. | -| 31011 | TCP | Control port address. Used between nodes in a Drill cluster. Needed for multi-node installation of Apache Drill. | -| 31012 | TCP | Data port address. Used between nodes in a Drill cluster. Needed for multi-node installation of Apache Drill. | -| 46655 | UDP | Used for JGroups and Infinispan. Needed for multi-node installation of Apache Drill. | - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/110-partition-pruning.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/110-partition-pruning.md b/_docs/manage-drill/110-partition-pruning.md deleted file mode 100644 index 75f2edd..0000000 --- a/_docs/manage-drill/110-partition-pruning.md +++ /dev/null @@ -1,75 +0,0 @@ ---- -title: "Partition Pruning" -parent: "Manage Drill" ---- -Partition pruning is a performance optimization that limits the number of -files and partitions that Drill reads when querying file systems and Hive -tables. Drill only reads a subset of the files that reside in a file system or -a subset of the partitions in a Hive table when a query matches certain filter -criteria. - -For Drill to apply partition pruning to Hive tables, you must have created the -tables in Hive using the `PARTITION BY` clause: - -`CREATE TABLE () PARTITION BY ();` - -When you create Hive tables using the `PARTITION BY` clause, each partition of -data is automatically split out into different directories as data is written -to disk. For more information about Hive partitioning, refer to the [Apache -Hive wiki](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL/#LanguageManualDDL-PartitionedTables). - -Typically, table data in a file system is organized by directories and -subdirectories. Queries on table data may contain `WHERE` clause filters on -specific directories. - -Drill’s query planner evaluates the filters as part of a Filter operator. If -no partition filters are present, the underlying Scan operator reads all files -in all directories and then sends the data to operators downstream, such as -Filter. - -When partition filters are present, the query planner determines if it can -push the filters down to the Scan such that the Scan only reads the -directories that match the partition filters, thus reducing disk I/O. - -## Partition Pruning Example - -The /`Users/max/data/logs` directory in a file system contains subdirectories -that span a few years. - -The following image shows the hierarchical structure of the `…/logs` directory -and (sub) directories: - -![drill query flow]({{ site.baseurl }}/docs/img/54.png) - -The following query requests log file data for 2013 from the `…/logs` -directory in the file system: - - SELECT * FROM dfs.`/Users/max/data/logs` WHERE cust_id < 10 and dir0 = 2013 limit 2; - -If you run the `EXPLAIN PLAN` command for the query, you can see that the` -…/logs` directory is filtered by the scan operator. - - EXPLAIN PLAN FOR SELECT * FROM dfs.`/Users/max/data/logs` WHERE cust_id < 10 and dir0 = 2013 limit 2; - -The following image shows a portion of the physical plan when partition -pruning is applied: - -![drill query flow]({{ site.baseurl }}/docs/img/21.png) - -## Filter Examples - -The following queries include examples of the types of filters eligible for -partition pruning optimization: - -**Example 1: Partition filters ANDed together** - - SELECT * FROM dfs.`/Users/max/data/logs` WHERE dir0 = '2014' AND dir1 = '1' - -**Example 2: Partition filter ANDed with regular column filter** - - SELECT * FROM dfs.`/Users/max/data/logs` WHERE cust_id < 10 AND dir0 = 2013 limit 2; - -**Example 3: Combination of AND, OR involving partition filters** - - SELECT * FROM dfs.`/Users/max/data/logs` WHERE (dir0 = '2013' AND dir1 = '1') OR (dir0 = '2014' AND dir1 = '2') - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/120-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/120-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md b/_docs/manage-drill/120-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md deleted file mode 100644 index 0033838..0000000 --- a/_docs/manage-drill/120-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -title: "Monitoring and Canceling Queries in the Drill Web UI" -parent: "Manage Drill" ---- -You can monitor and cancel queries from the Drill Web UI. To access the Drill -Web UI, the Drillbit process must be running on the Drill node that you use to -access the Drill Web UI. - -To monitor or cancel a query from the Drill Web UI, complete the following -steps: - - 1. Navigate to the Drill Web UI at `:8047.` -When you access the Drill Web UI, you see some general information about Drill -running in your cluster, such as the nodes running the Drillbit process, the -various ports Drill is using, and the amount of direct memory assigned to -Drill. -![drill query flow]({{ site.baseurl }}/docs/img/7.png) - - 2. Select **Profiles** in the toolbar. A list of running and completed queries appears. Drill assigns a query ID to each query and lists the Foreman node. The Foreman is the Drillbit node that receives the query from the client or application. The Foreman drives the entire query. -![drill query flow]({{ site.baseurl }}/docs/img/51.png) - - 3. Click the **Query ID** for the query that you want to monitor or cancel. The Query and Planning window appears. -![drill query flow]({{ site.baseurl }}/docs/img/4.png) - - 4. Select **Edit Query**. - 5. Click **Cancel query **to cancel the** query. The following message appears: - ![drill query flow]({{ site.baseurl }}/docs/img/46.png) - - 6. Optionally, you can re-run the query to see a query summary in this window. - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/configuration-options/010-configuration-options-introduction.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/configuration-options/010-configuration-options-introduction.md b/_docs/manage-drill/configuration-options/010-configuration-options-introduction.md deleted file mode 100644 index 0298006..0000000 --- a/_docs/manage-drill/configuration-options/010-configuration-options-introduction.md +++ /dev/null @@ -1,407 +0,0 @@ ---- -title: "Configuration Options Introduction" -parent: "Configuration Options" ---- -Drill provides many configuration options that you can enable, disable, or -modify. Modifying certain configuration options can impact Drill’s -performance. Many of Drill's configuration options reside in the `drill- -env.sh` and `drill-override.conf` files. Drill stores these files in the -`/conf` directory. Drill sources` /etc/drill/conf` if it exists. Otherwise, -Drill sources the local `/conf` directory. - -The sys.options table in Drill contains information about boot (start-up) and system options listed in the tables on this page. - -## Boot Options -The section, ["Start-up Options"]({{site.baseurl}}/docs/start-up-options), covers how to configure and view these options. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NameDefaultComments
drill.exec.buffer.impl"org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer"
drill.exec.buffer.size6Available memory in terms of record batches to hold data downstream of an operation. Increase this value to increase query speed.
drill.exec.compile.debugTRUE
drill.exec.http.enabledTRUE
drill.exec.operator.packages"org.apache.drill.exec.physical.config"
drill.exec.sort.external.batch.size4000
drill.exec.sort.external.spill.directories"/tmp/drill/spill"Determines which directory to use for spooling
drill.exec.sort.external.spill.group.size100
drill.exec.storage.file.text.batch.size4000
drill.exec.storage.packages"org.apache.drill.exec.store" "org.apache.drill.exec.store.mock"Ignore or include this module, including supplementary configuraiton information when scanning the class path scanning. This file is in [HOCON format](https://github.com/typesafehub/config/blob/master/HOCON.md).
drill.exec.sys.store.provider.classZooKeeper: "org.apache.drill.exec.store.sys.zk.ZkPStoreProvider"The Pstore (Persistent Configuration Storage) provider to use. The Pstore holds configuration and profile data.
drill.exec.zk.connect"localhost:2181"The ZooKeeper quorum that Drill uses to connect to data sources. Configure on each Drillbit node.
drill.exec.zk.refresh500
file.separator"/"
java.specification.version1.7
java.vm.name"Java HotSpot(TM) 64-Bit Server VM"
java.vm.specification.version1.7
log.path"/log/sqlline.log"
sun.boot.library.path/Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/jre/lib
sun.java.command"sqlline.SqlLine -d org.apache.drill.jdbc.Driver --maxWidth=10000 -u jdbc:drill:zk=local"
sun.os.patch.levelunknown
user""
- -## System Options -The sys.options table lists the following options that you can set at the session or system level as described in the section, ["Planning and Execution Options"]({{site.baseurl}}/docs/planning-and-execution-options) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NameDefaultComments
drill.exec.functions.cast_empty_string_to_nullFALSE
drill.exec.storage.file.partition.column.labeldirAccepts a string input.
exec.errors.verboseFALSEToggles verbose output of executable error messages
exec.java_compilerDEFAULTSwitches between DEFAULT, JDK, and JANINO mode for the current session. Uses Janino by default for generated source code of less than exec.java_compiler_janino_maxsize; otherwise, switches to the JDK compiler.
exec.java_compiler_debugTRUEToggles the output of debug-level compiler error messages in runtime generated code.
exec.java_compiler_janino_maxsize262144See the exec.java_compiler option comment. Accepts inputs of type LONG.
exec.max_hash_table_size1073741824Ending size for hash tables. Range: 0 - 1073741824
exec.min_hash_table_size65536Starting size for hash tables. Increase according to available memory to improve performance. Range: 0 - 1073741824
exec.queue.enableFALSEChanges the state of query queues to control the number of queries that run simultaneously.
exec.queue.large10Range: 0-1000
exec.queue.small100Range: 0-1001
exec.queue.threshold30000000Range: 0-9223372036854775807
exec.queue.timeout_millis300000Range: 0-9223372036854775807
planner.add_producer_consumerFALSEIncrease prefetching of data from disk. Disable for in-memory reads.
planner.affinity_factor1.2Accepts inputs of type DOUBLE.
planner.broadcast_factor1
planner.broadcast_threshold10000000Threshold in number of rows that triggers a broadcast join for a query if the right side of the join contains fewer rows than the threshold. Avoids broadcasting too many rows to join. Range: 0-2147483647
planner.disable_exchangesFALSEToggles the state of hashing to a random exchange.
planner.enable_broadcast_joinTRUEChanges the state of aggregation and join operators. Do not disable.
planner.enable_demux_exchangeFALSEToggles the state of hashing to a demulitplexed exchange.
planner.enable_hash_single_keyTRUE
planner.enable_hashaggTRUEEnable hash aggregation; otherwise, Drill does a sort-based aggregation. Does not write to disk. Enable is recommended.
planner.enable_hashjoinTRUEEnable the memory hungry hash join. Does not write to disk.
planner.enable_hashjoin_swap
planner.enable_mergejoinTRUESort-based operation. Writes to disk.
planner.enable_multiphase_aggTRUE
planner.enable_mux_exchangeTRUEToggles the state of hashing to a multiplexed exchange.
planner.enable_streamaggTRUESort-based operation. Writes to disk.
planner.identifier_max_length1024
planner.join.hash_join_swap_margin_factor10
planner.join.row_count_estimate_factor1
planner.memory.average_field_width8
planner.memory.enable_memory_estimationFALSE
planner.memory.hash_agg_table_factor1.1
planner.memory.hash_join_table_factor1.1
planner.memory.max_query_memory_per_node2147483648
planner.memory.non_blocking_operators_memory64Range: 0-2048
planner.partitioner_sender_max_threads8
planner.partitioner_sender_set_threads-1
planner.partitioner_sender_threads_factor1
planner.producer_consumer_queue_size10How much data to prefetch from disk (in record batches) out of band of query execution
planner.slice_target100000The number of records manipulated within a fragment before Drill parallelizes operations.
planner.width.max_per_node3The maximum degree of distribution of a query across cores and cluster nodes.
planner.width.max_per_query1000Same as max per node but applies to the query as executed by the entire cluster.
store.formatparquetOutput format for data written to tables with the CREATE TABLE AS (CTAS) command. Allowed values are parquet, json, or text. Allowed values: 0, -1, 1000000
store.json.all_text_modeFALSEDrill reads all data from the JSON files as VARCHAR. Prevents schema change errors.
store.mongo.all_text_modeFALSESimilar to store.json.all_text_mode for MongoDB.
store.parquet.block-size536870912Sets the size of a Parquet row group to the number of bytes less than or equal to the block size of MFS, HDFS, or the file system.
store.parquet.compressionsnappyCompression type for storing Parquet output. Allowed values: snappy, gzip, none
store.parquet.enable_dictionary_encoding*FALSEDo not change.
store.parquet.use_new_readerFALSENot supported
window.enable*FALSEComing soon.
- -\* Not supported in this release. - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/configuration-options/020-start-up-options.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/configuration-options/020-start-up-options.md b/_docs/manage-drill/configuration-options/020-start-up-options.md deleted file mode 100644 index 8a06232..0000000 --- a/_docs/manage-drill/configuration-options/020-start-up-options.md +++ /dev/null @@ -1,63 +0,0 @@ ---- -title: "Start-Up Options" -parent: "Configuration Options" ---- -Drill’s start-up options reside in a HOCON configuration file format, which is -a hybrid between a properties file and a JSON file. Drill start-up options -consist of a group of files with a nested relationship. At the core of the -file hierarchy is `drill-default.conf`. This file is overridden by one or more -`drill-module.conf` files, which are overridden by the `drill-override.conf` -file that you define. - -You can see the following group of files throughout the source repository in -Drill: - - common/src/main/resources/drill-default.conf - common/src/main/resources/drill-module.conf - contrib/storage-hbase/src/main/resources/drill-module.conf - contrib/storage-hive/core/src/main/resources/drill-module.conf - contrib/storage-hive/hive-exec-shade/src/main/resources/drill-module.conf - exec/java-exec/src/main/resources/drill-module.conf - distribution/src/resources/drill-override.conf - -These files are listed inside the associated JAR files in the Drill -distribution tarball. - -Each Drill module has a set of options that Drill incorporates. Drill’s -modular design enables you to create new storage plugins, set new operators, -or create UDFs. You can also include additional configuration options that you -can override as necessary. - -When you add a JAR file to Drill, you must include a `drill-module.conf` file -in the root directory of the JAR file that you add. The `drill-module.conf` -file tells Drill to scan that JAR file or associated object and include it. - -## Viewing Startup Options - -You can run the following query to see a list of Drill’s startup options: - - SELECT * FROM sys.options WHERE type='BOOT' - -## Configuring Start-Up Options - -You can configure start-up options for each Drillbit in the `drill- -override.conf` file located in Drill’s` /conf` directory. - -The summary of start-up options, also known as boot options, lists default values. The following descriptions provide more detail on key options that are frequently reconfigured: - -* drill.exec.sys.store.provider.class - - Defines the persistent storage (PStore) provider. The [PStore]({{ site.baseurl }}/docs/persistent-configuration-storage) holds configuration and profile data. - -* drill.exec.buffer.size - - Defines the amount of memory available, in terms of record batches, to hold data on the downstream side of an operation. Drill pushes data downstream as quickly as possible to make data immediately available. This requires Drill to use memory to hold the data pending operations. When data on a downstream operation is required, that data is immediately available so Drill does not have to go over the network to process it. Providing more memory to this option increases the speed at which Drill completes a query. - -* drill.exec.sort.external.spill.directories - - Tells Drill which directory to use when spooling. Drill uses a spool and sort operation for beyond memory operations. The sorting operation is designed to spool to a Hadoop file system. The default Hadoop file system is a local file system in the /tmp directory. Spooling performance (both writing and reading back from it) is constrained by the file system. For MapR clusters, use MapReduce volumes or set up local volumes to use for spooling purposes. Volumes improve performance and stripe data across as many disks as possible. - - -* drill.exec.zk.connect - Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting to point to the ZooKeeper quorum that you want Drill to use. You must configure this option on each Drillbit node. - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/configuration-options/030-planning-and-exececution-options.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/configuration-options/030-planning-and-exececution-options.md b/_docs/manage-drill/configuration-options/030-planning-and-exececution-options.md deleted file mode 100644 index f7d3442..0000000 --- a/_docs/manage-drill/configuration-options/030-planning-and-exececution-options.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -title: "Planning and Execution Options" -parent: "Configuration Options" ---- -You can set Drill query planning and execution options per cluster, at the -system or session level. Options set at the session level only apply to -queries that you run during the current Drill connection. Options set at the -system level affect the entire system and persist between restarts. Session -level settings override system level settings. - -You can run the following query to see a list of the system and session -planning and execution options: - - SELECT name FROM sys.options WHERE type in (SYSTEM, SESSION); - -## Configuring Planning and Execution Options - -Use the ALTER SYSTEM or ALTER SESSION commands to set options. Typically, -you set the options at the session level unless you want the setting to -persist across all sessions. - -The summary of system options lists default values. The following descriptions provide more detail on some of these options: - -### exec.min_hash_table_size - -The default starting size for hash tables. Increasing this size is useful for very large aggregations or joins when you have large amounts of memory for Drill to use. Drill can spend a lot of time resizing the hash table as it finds new data. If you have large data sets, you can increase this hash table size to increase performance. - -### planner.add_producer_consumer - -This option enables or disables a secondary reading thread that works out of band of the rest of the scanning fragment to prefetch data from disk. If you interact with a certain type of storage medium that is slow or does not prefetch much data, this option tells Drill to add a producer consumer reading thread to the operation. Drill can then assign one thread that focuses on a single reading fragment. If Drill is using memory, you can disable this option to get better performance. If Drill is using disk space, you should enable this option and set a reasonable queue size for the planner.producer_consumer_queue_size option. - -### planner.broadcast_threshold - -Threshold, in terms of a number of rows, that determines whether a broadcast join is chosen for a query. Regardless of the setting of the broadcast_join option (enabled or disabled), a broadcast join is not chosen unless the right side of the join is estimated to contain fewer rows than this threshold. The intent of this option is to avoid broadcasting too many rows for join purposes. Broadcasting involves sending data across nodes and is a network-intensive operation. (The "right side" of the join, which may itself be a join or simply a table, is determined by cost-based optimizations and heuristics during physical planning.) - -### planner.enable_broadcast_join, planner.enable_hashagg, planner.enable_hashjoin, planner.enable_mergejoin, planner.enable_multiphase_agg, planner.enable_streamagg - -These options enable or disable specific aggregation and join operators for queries. These operators are all enabled by default and in general should not be disabled.

Hash aggregation and hash join are hash-based operations. Streaming aggregation and merge join are sort-based operations. Both hash-based and sort-based operations consume memory; however, currently, hash-based operations do not spill to disk as needed, but the sort-based operations do. If large hash operations do not fit in memory on your system, you may need to disable these operations. Queries will continue to run, using alternative plans. - -### planner.producer_consumer_queue_size - -Determines how much data to prefetch from disk (in record batches) out of band of query execution. The larger the queue size, the greater the amount of memory that the queue and overall query execution consumes. - -### planner.width.max_per_node - -In this context *width* refers to fanout or distribution potential: the ability to run a query in parallel across the cores on a node and the nodes on a cluster. A physical plan consists of intermediate operations, known as query "fragments," that run concurrently, yielding opportunities for parallelism above and below each exchange operator in the plan. An exchange operator represents a breakpoint in the execution flow where processing can be distributed. For example, a single-process scan of a file may flow into an exchange operator, followed by a multi-process aggregation fragment. - -The maximum width per node defines the maximum degree of parallelism for any fragment of a query, but the setting applies at the level of a single node in the cluster. The *default* maximum degree of parallelism per node is calculated as follows, with the theoretical maximum automatically scaled back (and rounded down) so that only 70% of the actual available capacity is taken into account: number of active drillbits (typically one per node) * number of cores per node * 0.7 - -For example, on a single-node test system with 2 cores and hyper-threading enabled: 1 * 4 * 0.7 = 3 - -When you modify the default setting, you can supply any meaningful number. The system does not automatically scale down your setting. - -### planner.width.max_per_query - -The max_per_query value also sets the maximum degree of parallelism for any given stage of a query, but the setting applies to the query as executed by the whole cluster (multiple nodes). In effect, the actual maximum width per query is the *minimum of two values*: min((number of nodes * width.max_per_node), width.max_per_query) - -For example, on a 4-node cluster where `width.max_per_node` is set to 6 and `width.max_per_query` is set to 30: min((4 * 6), 30) = 24 - -In this case, the effective maximum width per query is 24, not 30. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/manage-drill/configuration-options/040-persistent-configuration-storage.md ---------------------------------------------------------------------- diff --git a/_docs/manage-drill/configuration-options/040-persistent-configuration-storage.md b/_docs/manage-drill/configuration-options/040-persistent-configuration-storage.md deleted file mode 100644 index 59180b5..0000000 --- a/_docs/manage-drill/configuration-options/040-persistent-configuration-storage.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -title: "Persistent Configuration Storage" -parent: "Configuration Options" ---- -Drill stores persistent configuration data in a persistent configuration store -(PStore). This data is encoded in JSON or Protobuf format. Drill can use the -local file system, ZooKeeper, HBase, or MapR-DB to store this data. The data -stored in a PStore includes state information for storage plugins, query -profiles, and ALTER SYSTEM settings. The default type of PStore configured -depends on the Drill installation mode. - -The following table provides the persistent storage mode for each of the Drill -modes: - -| Mode | Description | -|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Embedded | Drill stores persistent data in the local file system. You cannot modify the PStore location for Drill in embedded mode. | -| Distributed | Drill stores persistent data in ZooKeeper, by default. You can modify where ZooKeeper offloads data, or you can change the persistent storage mode to HBase or MapR-DB. | - -{% include startnote.html %}Switching between storage modes does not migrate configuration data.{% include endnote.html %} - -## ZooKeeper for Persistent Configuration Storage - -To make Drill installation and configuration simple, Drill uses ZooKeeper to -store persistent configuration data. The ZooKeeper PStore provider stores all -of the persistent configuration data in ZooKeeper except for query profile -data. - -The ZooKeeper PStore provider offloads query profile data to the -${DRILL_LOG_DIR:-/var/log/drill} directory on Drill nodes. If you want the -query profile data stored in a specific location, you can configure where -ZooKeeper offloads the data. - -To modify where the ZooKeeper PStore provider offloads query profile data, -configure the `sys.store.provider.zk.blobroot` property in the `drill.exec` -block in `/conf/drill-override.conf` on each -Drill node and then restart the Drillbit service. - -**Example** - - drill.exec: { - cluster-id: "my_cluster_com-drillbits", - zk.connect: ":", - sys.store.provider.zk.blobroot: "maprfs:///" - } - -Issue the following command to restart the Drillbit on all Drill nodes: - - maprcli node services -name drill-bits -action restart -nodes - -## HBase for Persistent Configuration Storage - -To change the persistent storage mode for Drill, add or modify the -`sys.store.provider` block in `/conf/drill- -override.conf.` - -**Example** - - sys.store.provider: { - class: "org.apache.drill.exec.store.hbase.config.HBasePStoreProvider", - hbase: { - table : "drill_store", - config: { - "hbase.zookeeper.quorum": ",,,", - "hbase.zookeeper.property.clientPort": "2181" - } - } - }, - -## MapR-DB for Persistent Configuration Storage - -If you have MapR-DB in your cluster, you can use MapR-DB for persistent -configuration storage. Using MapR-DB to store persistent configuration data -can prevent memory strain on ZooKeeper in clusters running heavy workloads. - -To change the persistent storage mode to MapR-DB, add or modify the -`sys.store.provider` block in `/conf/drill- -override.conf` on each Drill node and then restart the Drillbit service. - -**Example** - - sys.store.provider: { - class: "org.apache.drill.exec.store.hbase.config.HBasePStoreProvider", - hbase: { - table : "/tables/drill_store", - } - }, - -Issue the following command to restart the Drillbit on all Drill nodes: - - maprcli node services -name drill-bits -action restart -nodes - http://git-wip-us.apache.org/repos/asf/drill/blob/27067e8c/_docs/query-data/080-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md ---------------------------------------------------------------------- diff --git a/_docs/query-data/080-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md b/_docs/query-data/080-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md new file mode 100644 index 0000000..9c7cdc2 --- /dev/null +++ b/_docs/query-data/080-monitoring-and-canceling-queries-in-the-Drill-Web-UI.md @@ -0,0 +1,30 @@ +--- +title: "Monitoring and Canceling Queries in the Drill Web UI" +parent: "Query Data" +--- +You can monitor and cancel queries from the Drill Web UI. To access the Drill +Web UI, the Drillbit process must be running on the Drill node that you use to +access the Drill Web UI. + +To monitor or cancel a query from the Drill Web UI, complete the following +steps: + + 1. Navigate to the Drill Web UI at `:8047.` +When you access the Drill Web UI, you see some general information about Drill +running in your cluster, such as the nodes running the Drillbit process, the +various ports Drill is using, and the amount of direct memory assigned to +Drill. +![drill query flow]({{ site.baseurl }}/docs/img/7.png) + + 2. Select **Profiles** in the toolbar. A list of running and completed queries appears. Drill assigns a query ID to each query and lists the Foreman node. The Foreman is the Drillbit node that receives the query from the client or application. The Foreman drives the entire query. +![drill query flow]({{ site.baseurl }}/docs/img/51.png) + + 3. Click the **Query ID** for the query that you want to monitor or cancel. The Query and Planning window appears. +![drill query flow]({{ site.baseurl }}/docs/img/4.png) + + 4. Select **Edit Query**. + 5. Click **Cancel query **to cancel the** query. The following message appears: + ![drill query flow]({{ site.baseurl }}/docs/img/46.png) + + 6. Optionally, you can re-run the query to see a query summary in this window. +