drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [12/12] drill git commit: Merge branch 'gh-pages-master' into gh-pages
Date Tue, 17 Mar 2015 21:02:53 GMT
Merge branch 'gh-pages-master' into gh-pages


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/fbc18c48
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/fbc18c48
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/fbc18c48

Branch: refs/heads/gh-pages
Commit: fbc18c480ffd6a2ccb878a4beb3584c8d3d0b64e
Parents: 2856ae4 feaa579
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Tue Mar 17 14:02:01 2015 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Tue Mar 17 14:02:01 2015 -0700

----------------------------------------------------------------------
 _docs/005-connect.md                            |  27 +-
 _docs/008-sql-ref.md                            |   4 +-
 _docs/009-datasources.md                        |  25 +
 _docs/009-dev-custom-func.md                    |  37 --
 _docs/010-dev-custom-func.md                    |  37 ++
 _docs/010-manage.md                             |  14 -
 _docs/011-develop.md                            |   9 -
 _docs/011-manage.md                             |  14 +
 _docs/012-develop.md                            |   9 +
 _docs/012-rn.md                                 | 191 -------
 _docs/013-contribute.md                         |   9 -
 _docs/013-rn.md                                 | 191 +++++++
 _docs/014-contribute.md                         |   9 +
 _docs/014-sample-ds.md                          |  10 -
 _docs/015-design.md                             |  13 -
 _docs/015-sample-ds.md                          |  10 +
 _docs/016-design.md                             |  13 +
 _docs/016-progress.md                           |   8 -
 _docs/018-bylaws.md                             | 170 -------
 _docs/018-progress.md                           |   8 +
 _docs/019-bylaws.md                             | 170 +++++++
 _docs/connect/001-plugin-reg.md                 |  43 +-
 _docs/connect/002-plugin-conf.md                | 130 +++++
 _docs/connect/002-workspaces.md                 |  74 ---
 _docs/connect/003-reg-fs.md                     |  64 ---
 _docs/connect/003-workspaces.md                 |  74 +++
 _docs/connect/004-reg-fs.md                     |  64 +++
 _docs/connect/004-reg-hbase.md                  |  32 --
 _docs/connect/005-reg-hbase.md                  |  34 ++
 _docs/connect/005-reg-hive.md                   |  83 ---
 _docs/connect/006-default-frmt.md               |  60 ---
 _docs/connect/006-reg-hive.md                   |  82 +++
 _docs/connect/007-default-frmt.md               |  69 +++
 _docs/connect/007-mongo-plugin.md               | 167 ------
 _docs/connect/008-mapr-db-plugin.md             |  31 --
 _docs/connect/008-mongo-plugin.md               | 167 ++++++
 _docs/connect/009-mapr-db-plugin.md             |  30 ++
 _docs/contribute/001-guidelines.md              |   3 +-
 _docs/data-sources/001-hive-types.md            | 180 +++++++
 _docs/data-sources/002-hive-udf.md              |  40 ++
 _docs/data-sources/003-parquet-ref.md           | 269 ++++++++++
 _docs/data-sources/004-json-ref.md              | 504 +++++++++++++++++++
 _docs/img/Hbase_Browse.png                      | Bin 147495 -> 148451 bytes
 _docs/img/StoragePluginConfig.png               | Bin 20403 -> 0 bytes
 _docs/img/Untitled.png                          | Bin 39796 -> 0 bytes
 _docs/img/connect-plugin.png                    | Bin 0 -> 24774 bytes
 _docs/img/data-sources-schemachg.png            | Bin 0 -> 8071 bytes
 _docs/img/datasources-json-bracket.png          | Bin 0 -> 30129 bytes
 _docs/img/datasources-json.png                  | Bin 0 -> 16364 bytes
 _docs/img/get2kno_plugin.png                    | Bin 0 -> 55794 bytes
 _docs/img/json-workaround.png                   | Bin 0 -> 27547 bytes
 _docs/img/plugin-default.png                    | Bin 0 -> 56412 bytes
 _docs/install/001-drill-in-10.md                |   2 +-
 _docs/interfaces/001-odbc-win.md                |   3 +-
 .../interfaces/odbc-win/003-connect-odbc-win.md |   2 +-
 .../interfaces/odbc-win/004-tableau-examples.md |   6 +-
 _docs/manage/002-start-stop.md                  |   2 +-
 _docs/manage/003-ports.md                       |   2 +-
 _docs/manage/conf/002-startup-opt.md            |   3 +-
 _docs/manage/conf/003-plan-exec.md              |   3 +-
 _docs/manage/conf/004-persist-conf.md           |   2 +-
 _docs/query/001-get-started.md                  |  75 +++
 _docs/query/001-query-fs.md                     |  35 --
 _docs/query/002-query-fs.md                     |  35 ++
 _docs/query/002-query-hbase.md                  | 151 ------
 _docs/query/003-query-complex.md                |  56 ---
 _docs/query/003-query-hbase.md                  | 151 ++++++
 _docs/query/004-query-complex.md                |  56 +++
 _docs/query/004-query-hive.md                   |  45 --
 _docs/query/005-query-hive.md                   |  45 ++
 _docs/query/005-query-info-skema.md             | 109 ----
 _docs/query/006-query-info-skema.md             | 109 ++++
 _docs/query/006-query-sys-tbl.md                | 159 ------
 _docs/query/007-query-sys-tbl.md                | 159 ++++++
 _docs/query/get-started/001-lesson1-connect.md  |  88 ++++
 _docs/query/get-started/002-lesson2-download.md | 103 ++++
 _docs/query/get-started/003-lesson3-plugin.md   | 142 ++++++
 _docs/sql-ref/001-data-types.md                 | 215 +++++---
 _docs/sql-ref/002-lexical-structure.md          | 145 ++++++
 _docs/sql-ref/002-operators.md                  |  70 ---
 _docs/sql-ref/003-functions.md                  | 185 -------
 _docs/sql-ref/003-operators.md                  |  70 +++
 _docs/sql-ref/004-functions.md                  | 186 +++++++
 _docs/sql-ref/004-nest-functions.md             |  10 -
 _docs/sql-ref/005-cmd-summary.md                |   9 -
 _docs/sql-ref/005-nest-functions.md             |  10 +
 _docs/sql-ref/006-cmd-summary.md                |   9 +
 _docs/sql-ref/006-reserved-wds.md               |  16 -
 _docs/sql-ref/007-reserved-wds.md               |  16 +
 _docs/sql-ref/data-types/001-date.md            | 206 ++++----
 .../data-types/002-disparate-data-types.md      | 321 ++++++++++++
 _docs/tutorial/002-get2kno-sb.md                | 241 +++------
 _docs/tutorial/003-lesson1.md                   |  44 +-
 _docs/tutorial/005-lesson3.md                   |  98 ++--
 .../install-sandbox/001-install-mapr-vm.md      |   2 +-
 .../install-sandbox/002-install-mapr-vb.md      |   2 +-
 96 files changed, 4268 insertions(+), 2308 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/connect/006-reg-hive.md
----------------------------------------------------------------------
diff --cc _docs/connect/006-reg-hive.md
index 0000000,cf9b72a..dfb03dc
mode 000000,100644..100644
--- a/_docs/connect/006-reg-hive.md
+++ b/_docs/connect/006-reg-hive.md
@@@ -1,0 -1,82 +1,82 @@@
+ ---
+ title: "Hive Storage Plugin"
+ parent: "Storage Plugin Configuration"
+ ---
+ You can register a storage plugin instance that connects Drill to a Hive data
+ source that has a remote or embedded metastore service. When you register a
+ storage plugin instance for a Hive data source, provide a unique name for the
+ instance, and identify the type as “`hive`”. You must also provide the
+ metastore connection information.
+ 
+ Drill supports Hive 1.0. To access Hive tables
+ using custom SerDes or InputFormat/OutputFormat, all nodes running Drillbits
+ must have the SerDes or InputFormat/OutputFormat `JAR` files in the 
+ `<drill_installation_directory>/jars/3rdparty` folder.
+ 
+ ## Hive Remote Metastore
+ 
+ In this configuration, the Hive metastore runs as a separate service outside
+ of Hive. Drill communicates with the Hive metastore through Thrift. The
+ metastore service communicates with the Hive database over JDBC. Point Drill
+ to the Hive metastore service address, and provide the connection parameters
+ in the Drill Web UI to configure a connection to Drill.
+ 
+ **Note:** Verify that the Hive metastore service is running before you register the Hive
metastore.
+ 
+ To register a remote Hive metastore with Drill, complete the following steps:
+ 
+   1. Issue the following command to start the Hive metastore service on the system specified
in the `hive.metastore.uris`:
+ 
+         hive --service metastore
+   2. Navigate to [http://localhost:8047](http://localhost:8047/), and select the **Storage**
tab.
+   3. In the disabled storage plugins section, click **Update** next to the `hive` instance.
+   4. In the configuration window, add the `Thrift URI` and port to `hive.metastore.uris`.
+ 
+      **Example**
+      
+         {
+           "type": "hive",
+           "enabled": true,
+           "configProps": {
+             "hive.metastore.uris": "thrift://<localhost>:<port>",  
+             "hive.metastore.sasl.enabled": "false"
+           }
+         }       
+   5. Click **Enable**.
+   6. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to set the classpath,
add the following line to `drill-env.sh`.
+ 
+ Once you have configured a storage plugin instance for a Hive data source, you
+ can [query Hive tables](/docs/querying-hive/).
+ 
+ ## Hive Embedded Metastore
+ 
+ In this configuration, the Hive metastore is embedded within the Drill
+ process. Provide the metastore database configuration settings in the Drill
+ Web UI. Before you register Hive, verify that the driver you use to connect to
+ the Hive metastore is in the Drill classpath located in `/<drill installation
+ dirctory>/lib/.` If the driver is not there, copy the driver to `/<drill
+ installation directory>/lib` on the Drill node. For more information about
+ storage types and configurations, refer to ["Hive Metastore Administration"](https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin).
+ 
+ To register an embedded Hive metastore with Drill, complete the following
+ steps:
+ 
+   1. Navigate to `[http://localhost:8047](http://localhost:8047/)`, and select the **Storage**
tab
+   2. In the disabled storage plugins section, click **Update** next to `hive` instance.
+   3. In the configuration window, add the database configuration settings.
+ 
+      **Example**
+      
+         {
+           "type": "hive",
+           "enabled": true,
+           "configProps": {
+             "javax.jdo.option.ConnectionURL": "jdbc:<database>://<host:port>/<metastore
database>;create=true",
+             "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
+             "fs.default.name": "file:///",   
+           }
+         }
+   4. Click** Enable.**
+   5. Verify that `HADOOP_CLASSPATH` is set in `drill-env.sh`. If you need to set the classpath,
add the following line to `drill-env.sh`.
+   
 -        export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-<version-number>
++        export HADOOP_CLASSPATH=/<directory path>/hadoop/hadoop-<version-number>

http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/connect/007-default-frmt.md
----------------------------------------------------------------------
diff --cc _docs/connect/007-default-frmt.md
index 0000000,9325bdb..fc10c16
mode 000000,100644..100644
--- a/_docs/connect/007-default-frmt.md
+++ b/_docs/connect/007-default-frmt.md
@@@ -1,0 -1,69 +1,69 @@@
+ ---
+ title: "Drill Default Input Format"
+ parent: "Storage Plugin Configuration"
+ ---
+ You can define a default input format to tell Drill what file type exists in a
+ workspace within a file system. Drill determines the file type based on file
+ extensions and magic numbers when searching a workspace.
+ 
+ Magic numbers are file signatures that Drill uses to identify Parquet files.
+ If Drill cannot identify the file type based on file extensions or magic
+ numbers, the query fails. Defining a default input format can prevent queries
+ from failing in situations where Drill cannot determine the file type.
+ 
+ If you incorrectly define the file type in a workspace and Drill cannot
+ determine the file type, the query fails. For example, if the directory for
+ which you have defined a workspace contains JSON files and you defined the
+ default input format as CSV, the query fails against the workspace.
+ 
+ You can define one default input format per workspace. If you do not define a
+ default input format, and Drill cannot detect the file format, the query
+ fails. You can define a default input format for any of the file types that
+ Drill supports. Currently, Drill supports the following types:
+ 
+   * CSV
+   * TSV
+   * PSV
+   * Parquet
+   * JSON
+ 
+ ## Defining a Default Input Format
+ 
+ You define the default input format for a file system workspace through the
+ Drill Web UI. You must have a [defined workspace](/docs/workspaces) before you can define
a
+ default input format.
+ 
+ To define a default input format for a workspace, complete the following
+ steps:
+ 
+   1. Navigate to the Drill Web UI at `<drill_node_ip_address>:8047`. The Drillbit
process must be running on the node before you connect to the Drill Web UI.
+   2. Select **Storage** in the toolbar.
+   3. Click **Update** next to the file system for which you want to define a default input
format for a workspace.
+   4. In the Configuration area, locate the workspace for which you would like to define
the default input format, and change the `defaultInputFormat` attribute to any of the supported
file types.
+ 
+      **Example**
+      
+         {
+           "type": "file",
+           "enabled": true,
+           "connection": "hdfs:///",
+           "workspaces": {
+             "root": {
+               "location": "/drill/testdata",
+               "writable": false,
+               "defaultInputFormat": csv
+           },
+           "local" : {
+             "location" : "/max/proddata",
+             "writable" : true,
+             "defaultInputFormat" : "json"
+         }
+ 
+ ## Querying Compressed JSON
+ 
+ You can use Drill 0.8 and later to query compressed JSON in .gz files as well as uncompressed
files having the .json extension. First, add the gz extension to a storage plugin, and then
use that plugin to query the compressed file.
+ 
+       "extensions": [
+         "json",
+         "gz"
 -      ]
++      ]

http://git-wip-us.apache.org/repos/asf/drill/blob/fbc18c48/_docs/manage/conf/002-startup-opt.md
----------------------------------------------------------------------
diff --cc _docs/manage/conf/002-startup-opt.md
index 898a7ba,d1766fb..e0b64bf
--- a/_docs/manage/conf/002-startup-opt.md
+++ b/_docs/manage/conf/002-startup-opt.md
@@@ -46,5 -46,5 +46,4 @@@ override.conf` file located in Drill’
  You may want to configure the following start-up options that control certain
  behaviors in Drill:
  
- <table ><tbody><tr><th >Option</th><th >Default Value</th><th
>Description</th></tr><tr><td valign="top" >drill.exec.sys.store.provider</td><td
valign="top" >ZooKeeper</td><td valign="top" >Defines the persistent storage
(PStore) provider. The PStore holds configuration and profile data. For more information about
PStores, see <a href="/docs/persistent-configuration-storage" rel="nofollow">Persistent
Configuration Storage</a>.</td></tr><tr><td valign="top" >drill.exec.buffer.size</td><td
valign="top" > </td><td valign="top" >Defines the amount of memory available,
in terms of record batches, to hold data on the downstream side of an operation. Drill pushes
data downstream as quickly as possible to make data immediately available. This requires Drill
to use memory to hold the data pending operations. When data on a downstream operation is
required, that data is immediately available so Drill does not have to go over the network
to process it. Providing more memory to this option inc
 reases the speed at which Drill completes a query.</td></tr><tr><td
valign="top" >drill.exec.sort.external.directoriesdrill.exec.sort.external.fs</td><td
valign="top" > </td><td valign="top" >These options control spooling. The drill.exec.sort.external.directories
option tells Drill which directory to use when spooling. The drill.exec.sort.external.fs option
tells Drill which file system to use when spooling beyond memory files. <span style="line-height:
1.4285715;background-color: transparent;"> </span>Drill uses a spool and sort operation
for beyond memory operations. The sorting operation is designed to spool to a Hadoop file
system. The default Hadoop file system is a local file system in the /tmp directory. Spooling
performance (both writing and reading back from it) is constrained by the file system. <span
style="line-height: 1.4285715;background-color: transparent;"> </span>For MapR clusters,
use MapReduce volumes or set up local volumes to use for spooling purposes. Volumes 
 improve performance and stripe data across as many disks as possible.</td></tr><tr><td
valign="top" colspan="1" >drill.exec.debug.error_on_leak</td><td valign="top"
colspan="1" >True</td><td valign="top" colspan="1" >Determines how Drill behaves
when memory leaks occur during a query. By default, this option is enabled so that queries
fail when memory leaks occur. If you disable the option, Drill issues a warning when a memory
leak occurs and completes the query.</td></tr><tr><td valign="top" colspan="1"
>drill.exec.zk.connect</td><td valign="top" colspan="1" >localhost:2181</td><td
valign="top" colspan="1" >Provides Drill with the ZooKeeper quorum to use to connect to
data sources. Change this setting to point to the ZooKeeper quorum that you want Drill to
use. You must configure this option on each Drillbit node.</td></tr><tr><td
valign="top" colspan="1" >drill.exec.cluster-id</td><td valign="top" colspan="1"
>my_drillbit_cluster</td><td valign="top" colspan="1" >Identifies the cl
 uster that corresponds with the ZooKeeper quorum indicated. It also provides Drill with the
name of the cluster used during UDP multicast. You must change the default cluster-id if there
are multiple clusters on the same subnet. If you do not change the ID, the clusters will try
to connect to each other to create one cluster.</td></tr></tbody></table>
- 
+ <table ><tbody><tr><th >Option</th><th >Default Value</th><th
>Description</th></tr><tr><td valign="top" >drill.exec.sys.store.provider</td><td
valign="top" >ZooKeeper</td><td valign="top" >Defines the persistent storage
(PStore) provider. The PStore holds configuration and profile data. For more information about
PStores, see <a href="/docs/persistent-configuration-storage" rel="nofollow">Persistent
Configuration Storage</a>.</td></tr><tr><td valign="top" >drill.exec.buffer.size</td><td
valign="top" > </td><td valign="top" >Defines the amount of memory available,
in terms of record batches, to hold data on the downstream side of an operation. Drill pushes
data downstream as quickly as possible to make data immediately available. This requires Drill
to use memory to hold the data pending operations. When data on a downstream operation is
required, that data is immediately available so Drill does not have to go over the network
to process it. Providing more memory to this option inc
 reases the speed at which Drill completes a query.</td></tr><tr><td
valign="top" >drill.exec.sort.external.directoriesdrill.exec.sort.external.fs</td><td
valign="top" > </td><td valign="top" >These options control spooling. The drill.exec.sort.external.directories
option tells Drill which directory to use when spooling. The drill.exec.sort.external.fs option
tells Drill which file system to use when spooling beyond memory files. <span style="line-height:
1.4285715;background-color: transparent;"> </span>Drill uses a spool and sort operation
for beyond memory operations. The sorting operation is designed to spool to a Hadoop file
system. The default Hadoop file system is a local file system in the /tmp directory. Spooling
performance (both writing and reading back from it) is constrained by the file system. <span
style="line-height: 1.4285715;background-color: transparent;"> </span>For MapR clusters,
use MapReduce volumes or set up local volumes to use for spooling purposes. Volumes 
 improve performance and stripe data across as many disks as possible.</td></tr><tr><td
valign="top" colspan="1" >drill.exec.debug.error_on_leak</td><td valign="top"
colspan="1" >True</td><td valign="top" colspan="1" >Determines how Drill behaves
when memory leaks occur during a query. By default, this option is enabled so that queries
fail when memory leaks occur. If you disable the option, Drill issues a warning when a memory
leak occurs and completes the query.</td></tr><tr><td valign="top" colspan="1"
>drill.exec.zk.connect</td><td valign="top" colspan="1" >localhost:2181</td><td
valign="top" colspan="1" >Provides Drill with the ZooKeeper quorum to use to connect to
data sources. Change this setting to point to the ZooKeeper quorum that you want Drill to
use. You must configure this option on each Drillbit node.</td></tr><tr><td
valign="top" colspan="1" >drill.exec.cluster-id</td><td valign="top" colspan="1"
>my_drillbit_cluster</td><td valign="top" colspan="1" >Identifies the cl
 uster that corresponds with the ZooKeeper quorum indicated. It also provides Drill with the
name of the cluster used during UDP multicast. You must change the default cluster-id if there
are multiple clusters on the same subnet. If you do not change the ID, the clusters will try
to connect to each other to create one cluster.</td></tr></tbody></table></div>
 -


Mime
View raw message