drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [1/2] drill git commit: Daniel's review
Date Tue, 04 Aug 2015 23:20:50 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages f7515b69c -> 98dfbea8b


Daniel's review


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/1fc4d00c
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/1fc4d00c
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/1fc4d00c

Branch: refs/heads/gh-pages
Commit: 1fc4d00cfab6524a966285bbea19aac10fc59f9a
Parents: f7515b6
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Mon Jul 27 15:41:24 2015 -0700
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Mon Jul 27 15:43:20 2015 -0700

----------------------------------------------------------------------
 .../010-connect-a-data-source-introduction.md   |  4 +-
 .../020-storage-plugin-registration.md          |  8 +-
 .../035-plugin-configuration-basics.md          | 30 +++---
 .../040-file-system-storage-plugin.md           | 96 +++++++++++---------
 _docs/connect-a-data-source/050-workspaces.md   | 22 +++--
 .../060-hbase-storage-plugin.md                 |  7 +-
 .../070-hive-storage-plugin.md                  | 26 +++---
 .../080-drill-default-input-format.md           | 73 ++++++---------
 .../090-mongodb-plugin-for-apache-drill.md      | 39 +++-----
 .../050-json-data-model.md                      |  2 +-
 .../030-querying-plain-text-files.md            |  7 +-
 .../005-about-the-mapr-sandbox.md               |  9 +-
 12 files changed, 152 insertions(+), 171 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/010-connect-a-data-source-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/010-connect-a-data-source-introduction.md b/_docs/connect-a-data-source/010-connect-a-data-source-introduction.md
index 2ba46e3..7e4e2b5 100644
--- a/_docs/connect-a-data-source/010-connect-a-data-source-introduction.md
+++ b/_docs/connect-a-data-source/010-connect-a-data-source-introduction.md
@@ -2,9 +2,9 @@
 title: "Connect a Data Source Introduction"
 parent: "Connect a Data Source"
 ---
-A storage plugin is a software module for connecting Drill to data sources. A storage plugin
typically optimizes execution of Drill queries, provides the location of the data, and configures
the workspace and file formats for reading data. Several storage plugins are installed with
Drill that you can configure to suit your environment. Through the storage plugin, Drill connects
to a data source, such as a database, a file on a local or distributed file system, or a Hive
metastore. 
+A storage plugin is a software module for connecting Drill to data sources. A storage plugin
typically optimizes execution of Drill queries, provides the location of the data, and configures
the workspace and file formats for reading data. Several storage plugins are installed with
Drill that you can configure to suit your environment. Through a storage plugin, Drill connects
to a data source, such as a database, a file on a local or distributed file system, or a Hive
metastore. 
 
-You can modify the default configuration of a storage plugin X and give the new version a
unique name Y. This document refers to Y as a different storage plugin, although it is actually
just a reconfiguration of original interface. When you execute a query, Drill gets the storage
plugin name in one of several ways:
+You can modify the default configuration X of a storage plugin and give the new configuration
a unique name Y. This document refers to Y as a different storage plugin, although it is actually
just a reconfiguration of original interface. When you execute a query, Drill gets the storage
plugin configuration name in one of several ways:
 
 * The FROM clause of the query can identify the plugin to use.
 * The USE <plugin name> command can precede the query.

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/020-storage-plugin-registration.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/020-storage-plugin-registration.md b/_docs/connect-a-data-source/020-storage-plugin-registration.md
index 73ff050..831f18d 100644
--- a/_docs/connect-a-data-source/020-storage-plugin-registration.md
+++ b/_docs/connect-a-data-source/020-storage-plugin-registration.md
@@ -2,11 +2,11 @@
 title: "Storage Plugin Registration"
 parent: "Connect a Data Source"
 ---
-You connect Drill to a file system, Hive, HBase, or other data source through a storage plugin.
On the Storage tab of the Web UI, you can view and reconfigure a storage plugin. You can create
a new name for the reconfigured version, thereby registering the new version. To open the
Storage tab, go to `http://<IP address>:8047/storage`, where IP address is any one of
the installed Drillbits in a distributed system or `localhost` in an embedded system:
+You connect Drill to a file system, Hive, HBase, or other data source through a storage plugin.
On the Storage tab of the Web UI, you can view and reconfigure a storage plugin. You can create
a new name for the reconfigured version, thereby registering the new version. To open the
Storage tab, go to `http://<IP address>:8047/storage`, where IP address is the host
name or IP address of one of the installed Drillbits in a distributed system or `localhost`
in an embedded system:
 
 ![drill-installed plugins]({{ site.baseurl }}/docs/img/plugin-default.png)
 
-The Drill installation registers the the `cp`, `dfs`, `hbase`, `hive`, and `mongo` storage
plugin configurations.
+The Drill installation registers the `cp`, `dfs`, `hbase`, `hive`, and `mongo` default storage
plugin configurations.
 
 * `cp`  
   Points to a JAR file in the Drill classpath that contains the Transaction Processing Performance
Council (TPC) benchmark schema TPC-H that you can query. 
@@ -20,7 +20,9 @@ point to any distributed file system, such as a Hadoop or S3 file system.
 * `mongo`  
    Provides a connection to MongoDB data.
 
-In the [Drill sandbox]({{site.baseurl}}/docs/about-the-mapr-sandbox/), the `dfs` storage
plugin connects you to a simulation of a distributed file system. If you install Drill, `dfs`
connects you to the root of your file system.
+In the [Drill sandbox]({{site.baseurl}}/docs/about-the-mapr-sandbox/), the `dfs` storage
plugin configuration connects you to a Hadoop environment pre-configured with Drill. If you
install Drill, `dfs` connects you to the root of your file system.
+
+## Storage Plugin Configuration Persistance
 
 Drill saves storage plugin configurations in a temporary directory (embedded mode) or in
ZooKeeper (distributed mode). The storage plugin configuration persists after upgrading, so
a configuration that you created in one version of Drill appears in the Drill Web UI of an
upgraded version of Drill. For example, on Mac OS X, Drill uses `/tmp/drill/sys.storage_plugins`
to store storage plugin configurations. To revert to the default storage plugins for a particular
version, in embedded mode, delete the contents of this directory and restart the Drill shell.
 

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/035-plugin-configuration-basics.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/035-plugin-configuration-basics.md b/_docs/connect-a-data-source/035-plugin-configuration-basics.md
index 3319581..9eda181 100644
--- a/_docs/connect-a-data-source/035-plugin-configuration-basics.md
+++ b/_docs/connect-a-data-source/035-plugin-configuration-basics.md
@@ -2,12 +2,14 @@
 title: "Plugin Configuration Basics"
 parent: "Storage Plugin Configuration"
 ---
-When you add or update storage plugin instances on one Drill node in a 
+When you add or update storage plugin configurations on one Drill node in a 
 cluster having multiple installations of Drill, Drill broadcasts the information to other
Drill nodes 
 to synchronize the storage plugin configurations. You do not need to
-restart any of the Drillbits when you add or update a storage plugin instance.
+restart any of the Drillbits when you add or update a storage plugin configuration.
 
-Use the Drill Web UI to update or add a new storage plugin configuration. Launch a web browser,
go to: `http://<IP address or host name>:8047`, and then go to the Storage tab. 
+## Using the Drill Web UI
+
+Use the Drill Web UI to update or add a new storage plugin configuration. The Drill shell
needs to be running to access the Drill Web UI. To open the Drill Web UI, launch a web browser,
and go to: `http://<IP address or host name>:8047` of any Drillbit in the cluster. Select
the Storage tab to view, update, or add a new storage plugin configuration. 
 
 To create a name and new configuration:
 
@@ -46,7 +48,7 @@ The following table describes the attributes you configure for storage plugins
i
     <td>"connection"</td>
     <td>"classpath:///"<br>"file:///"<br>"mongodb://localhost:27017/"<br>"hdfs://"</td>
     <td>implementation-dependent</td>
-    <td>Type of distributed file system, such as HDFS, Amazon S3, or files in your
file system.</td>
+    <td>The type of distributed file system, such as HDFS, Amazon S3, or files in your
file system, and an address/path name.</td>
   </tr>
   <tr>
     <td>"workspaces"</td>
@@ -70,13 +72,13 @@ The following table describes the attributes you configure for storage
plugins i
     <td>"workspaces". . . "defaultInputFormat"</td>
     <td>null<br>"parquet"<br>"csv"<br>"json"</td>
     <td>no</td>
-    <td>Format for reading data, regardless of extension. Default = Parquet.</td>
+    <td>Format for reading data, regardless of extension. Default = "parquet"</td>
   </tr>
   <tr>
     <td>"formats"</td>
     <td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"avro"<br>"maprdb"
*</td>
     <td>yes</td>
-    <td>One or more valid file formats for reading. Drill implicitly detects formats
of some files based on extension or bits of data in the file, others require configuration.</td>
+    <td>One or more valid file formats for reading. Drill implicitly detects formats
of some files based on extension or bits of data in the file; others require configuration.</td>
   </tr>
   <tr>
     <td>"formats" . . . "type"</td>
@@ -88,13 +90,13 @@ The following table describes the attributes you configure for storage
plugins i
     <td>formats . . . "extensions"</td>
     <td>["csv"]</td>
     <td>format-dependent</td>
-    <td>Extensions of the files that Drill can read.</td>
+    <td>File name extensions that Drill can read.</td>
   </tr>
   <tr>
     <td>"formats" . . . "delimiter"</td>
     <td>"\t"<br>","</td>
     <td>format-dependent</td>
-    <td>One or more characters that serve as a record seperator in a delimited text
file, such as CSV. Use a 4-digit hex ascii code syntax \uXXXX for a non-printable delimiter.
</td>
+    <td>Sequence of one or more characters that serve as a record separator in a delimited
text file, such as CSV. Use a 4-digit hex code syntax \uXXXX for a non-printable delimiter.
</td>
   </tr>
   <tr>
     <td>"formats" . . . "quote"</td>
@@ -148,7 +150,7 @@ Drill provides a REST API that you can use to create a storage plugin
configurat
   The storage plugin configuration name. 
 
 * config  
-  The attribute settings as you would enter it in the Web UI.
+  The attribute settings as entered in the Web UI.
 
 For example, this command creates a storage plugin named myplugin for reading files of an
unknown type located on the root of the file system:
 
@@ -156,13 +158,13 @@ For example, this command creates a storage plugin named myplugin for
reading fi
 
 ## Bootstrapping a Storage Plugin
 
-If you need to add a storage plugin to Drill and do not want to use a web browser, you can
create a [bootstrap-storage-plugins.json](https://github.com/apache/drill/blob/master/contrib/storage-hbase/src/main/resources/bootstrap-storage-plugins.json)
file and include it on the classpath when starting Drill. The storage plugin loads when Drill
starts up.
+If you need to add a storage plugin configurationto Drill and do not want to use a web browser,
you can create a [bootstrap-storage-plugins.json](https://github.com/apache/drill/blob/master/contrib/storage-hbase/src/main/resources/bootstrap-storage-plugins.json)
file and include it on the classpath when starting Drill. The storage plugin configuration
loads when Drill starts up.
 
-Bootstrapping a storage plugin works only when the first drillbit in the cluster first starts
up. The configuration is
-stored in zookeeper, preventing Drill from picking up the boostrap-storage-plugins.json again.
+Bootstrapping a storage plugin configuration works only when the first Drillbit in the cluster
first starts up. The configuration is
+stored in ZooKeeper, preventing Drill from picking up the bootstrap-storage-plugins.json
again.
 
 After cluster startup, you have to use the REST API or Drill Web UI to add a storage plugin
configuration. Alternatively, you
-can modify the entry in zookeeper by uploading the json file for
+can modify the entry in ZooKeeper by uploading the json file for
 that plugin to the /drill directory of the zookeeper installation, or by just deleting the
/drill directory if you do not have configuration properties to preserve.
 
-If you configure an HBase storage plugin using bootstrap-storage-plugins.json file and HBase
is not installed, you might experience a delay when executing the queries. Configure the [HBase
client timeout](http://hbase.apache.org/book.html#config.files) and retry settings in the
config block of HBase plugin instance configuration.
+If you load an HBase storage plugin configuration using bootstrap-storage-plugins.json file
and HBase is not installed, you might experience a delay when executing the queries. Configure
the [HBase client timeout](http://hbase.apache.org/book.html#config.files) and retry settings
in the config block of the HBase plugin configuration.

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/040-file-system-storage-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/040-file-system-storage-plugin.md b/_docs/connect-a-data-source/040-file-system-storage-plugin.md
index 7d299d2..3380d28 100644
--- a/_docs/connect-a-data-source/040-file-system-storage-plugin.md
+++ b/_docs/connect-a-data-source/040-file-system-storage-plugin.md
@@ -2,63 +2,71 @@
 title: "File System Storage Plugin"
 parent: "Storage Plugin Configuration"
 ---
-You can register a storage plugin instance that connects Drill to a local file system or
to a distributed file system registered in `core-site.xml`, such as S3
+You can register a storage plugin configuration that connects Drill to a local file system
or to a distributed file system registered in the Hadoop `core-site.xml`, such as S3
 or HDFS. By
-default, Apache Drill includes an storage plugin named `dfs` that points to the local file
+default, Apache Drill includes a storage plugin configuration named `dfs` that points to
the local file
 system on your machine by default. 
 
 ## Connecting Drill to a File System
 
-In a Drill cluster, you typically do not query the local file system, but instead place files
on the distributed file system. You configure the connection property of the storage plugin
workspace to connect Drill to a distributed file system. For example, the following connection
properties connect Drill to an HDFS cluster from a client:
+In a Drill cluster, you typically do not query the local file system, but instead place files
on the distributed file system. You configure the connection property of the storage plugin
workspace to connect Drill to a distributed file system. For example, the following connection
property connects Drill to an HDFS cluster from a client:
 
 `"connection": "hdfs://<IP Address>:<Port>/"`   
 
-To query a file on HDFS from a node on the cluster, you can simply change the connection
to from `file:///` to `hdfs://` in the `dfs` storage plugin.
+To query a file on HDFS from a node on the cluster, you can simply change the connection
from `file:///` to `hdfs://` in the `dfs` storage plugin.
+
+To change the `dfs` storage plugin configuration to point to a different local or a distributed
file system, use `connection` attributes as shown in the following examples.
 
-To change the `dfs` storage plugin configuration to point to a local or a distributed file
system, use `connection` attributes as shown in the following example.
 * Local file system example:
 
-    {
-      "type": "file",
-      "enabled": true,
-      "connection": "file:///",
-      "workspaces": {
-        "root": {
-          "location": "/user/max/donuts",
-          "writable": false,
-          "defaultInputFormat": null
-         }
-      },
-         "formats" : {
-           "json" : {
-             "type" : "json"
-           }
-         }
+  ```
+  {
+    "type": "file",
+    "enabled": true,
+    "connection": "file:///",
+    "workspaces": {
+      "root": {
+        "location": "/user/max/donuts",
+        "writable": false,
+        "defaultInputFormat": null
+       }
+    },
+    "formats" : {
+      "json" : {
+        "type" : "json"
       }
+    }
+  }
+  ```
+
 * Distributed file system example:
-    
-    {
-      "type" : "file",
-      "enabled" : true,
-      "connection" : "hdfs://10.10.30.156:8020/",
-      "workspaces" : {
-        "root" : {
-          "location" : "/user/root/drill",
-          "writable" : true,
-          "defaultInputFormat" : null
-        }
-      },
-      "formats" : {
-        "json" : {
-          "type" : "json"
-        }
+
+  ```
+  {
+    "type" : "file",
+    "enabled" : true,
+    "connection" : "hdfs://10.10.30.156:8020/",
+    "workspaces" : {
+      "root" : {
+        "location" : "/user/root/drill",
+        "writable" : true,
+        "defaultInputFormat" : null
+      }
+    },
+    "formats" : {
+      "json" : {
+        "type" : "json"
       }
     }
+  }
+  ```
+
+To connect to a Hadoop file system, you include the IP address and port number of the
+name node.
 
-To connect to a Hadoop file system, you include the IP address of the
-name node and the port number.
+### Querying Donuts Example
 
-The following example shows an file type storage plugin configuration with a
+The following example shows a file type storage plugin configuration with a
 workspace named `json_files`. The configuration points Drill to the
 `/users/max/drill/json/` directory in the local file system `(dfs)`:
 
@@ -74,18 +82,16 @@ workspace named `json_files`. The configuration points Drill to the
        } 
     },
 
-The `connection` parameter in this configuration is "`file:///`", connecting Drill to the
local file system (`dfs`).
+The `connection` parameter in this configuration is "`file:///`", connecting Drill to the
local file system.
 
 To query a file in the example `json_files` workspace, you can issue the `USE`
 command to tell Drill to use the `json_files` workspace configured in the `dfs`
 instance for each query that you issue:
 
-**Example**
-
     USE dfs.json_files;
-    SELECT * FROM dfs.json_files.`donuts.json` WHERE type='frosted'
+    SELECT * FROM `donuts.json` WHERE type='frosted'
 
 If the `json_files` workspace did not exist, the query would have to include the
-full path to the `donuts.json` file:
+full file path name to the `donuts.json` file:
 
     SELECT * FROM dfs.`/users/max/drill/json/donuts.json` WHERE type='frosted';
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/050-workspaces.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/050-workspaces.md b/_docs/connect-a-data-source/050-workspaces.md
index b535267..258e3fd 100644
--- a/_docs/connect-a-data-source/050-workspaces.md
+++ b/_docs/connect-a-data-source/050-workspaces.md
@@ -2,26 +2,28 @@
 title: "Workspaces"
 parent: "Storage Plugin Configuration"
 ---
-You can define one or more workspaces in a storage plugin configuration. The workspace defines
the directory location of files in a local or distributed file system. Drill searches the
workspace to locate data when
+You can define one or more workspaces in a storage plugin configuration. The workspace defines
the location of files in subdirectories of a local or distributed file system. Drill searches
the workspace to locate data when
 you run a query. The `default`
 workspace points to the root of the file system. 
 
-Configuring `workspaces` to include a file location simplifies the query, which is important
when querying the same data source repeatedly. After you configure a long path name in the
workspaces location property, instead of
-using the full path to the data source, you use dot notation in the FROM
+Configuring workspaces to include a subdirectory simplifies the query, which is important
when querying the same files repeatedly. After you configure a long path name in the workspace
`location` property, instead of
+using the full path name to the data source, you use dot notation in the FROM
 clause.
 
-``<workspaces>.`<location>```
+``<workspace name>.`<location>```
 
-To query the data source while you are not *using* that storage plugin, include the plugin
name. This syntax assumes you did not issue a USE statement to connect to a storage plugin
that defines the
+Where `<location>` is the path name of a subdirectory, such as `/users/max/drill/json`
enclosed in double quotation marks as shown in the ["Querying Donuts Example."](/docs/file-system-storage-plugin/#querying-donuts-example)
+
+To query the data source when you have not set the default schema name to the storage plugin
configuration, include the plugin name. This syntax assumes you did not issue a USE statement
to connect to a storage plugin that defines the
 location of the data:
 
-``<plugin>.<workspaces>.`<location>```
+``<plugin>.<workspace name>.`<location>```
 
 
 ## No Workspaces for Hive and HBase
 
-You cannot configure workspaces for
-`hive` and `hbase`, though Hive databases show up as workspaces in
+You cannot include workspaces in the configurations of the
+`hive` and `hbase` plugins installed with Apache Drill, though Hive databases show up as
workspaces in
 Drill. Each `hive` instance includes a `default` workspace that points to the  Hive metastore.
When you query
 files and tables in the `hive default` workspaces, you can omit the
 workspace name from the query.
@@ -34,9 +36,9 @@ using either of the following queries and get the same results:
     SELECT * FROM hive.customers LIMIT 10;
     SELECT * FROM hive.`default`.customers LIMIT 10;
 
-{% include startnote.html %}Default is a reserved word. You must enclose reserved words in
back ticks.{% include endnote.html %}
+{% include startnote.html %}Default is a reserved word. You must enclose reserved words when
used as identifiers in back ticks.{% include endnote.html %}
 
-Because the HBase storage plugin configuration does not have a workspace, you can use the
following
+Because the HBase storage plugin does not accommodate a workspace, you can use the following
 query:
 
     SELECT * FROM hbase.customers LIMIT 10;

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/060-hbase-storage-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/060-hbase-storage-plugin.md b/_docs/connect-a-data-source/060-hbase-storage-plugin.md
index 488a564..d97feab 100644
--- a/_docs/connect-a-data-source/060-hbase-storage-plugin.md
+++ b/_docs/connect-a-data-source/060-hbase-storage-plugin.md
@@ -2,12 +2,9 @@
 title: "HBase Storage Plugin"
 parent: "Storage Plugin Configuration"
 ---
-Specify a ZooKeeper quorum to connect
-Drill to an HBase data source. Drill supports HBase version 0.98.
+When connecting Drill to an HBase data source using the HBase storage plugin installed with
Drill, you need to specify a ZooKeeper quorum. Drill supports HBase version 0.98.
 
-To HBase storage plugin configuration installed with Drill appears as follows when you navigate
to [http://localhost:8047](http://localhost:8047/), and select the **Storage** tab.
-
-     **Example**  
+To view or change the HBase storage plugin configuration, use the [Drill Web UI]({{ site.baseurl
}}/docs/plugin-configuration-basics/#using-the-drill-web-ui). In the Web UI, select the **Storage**
tab, and then click the **Update** button for the `hbase` storage plugin configuration. The
following example shows a typical HBase storage plugin:
 
             {
               "type": "hbase",

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/070-hive-storage-plugin.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/070-hive-storage-plugin.md b/_docs/connect-a-data-source/070-hive-storage-plugin.md
index c7ab31f..83b9e43 100644
--- a/_docs/connect-a-data-source/070-hive-storage-plugin.md
+++ b/_docs/connect-a-data-source/070-hive-storage-plugin.md
@@ -7,22 +7,22 @@ using custom SerDes or InputFormat/OutputFormat, all nodes running Drillbits
 must have the SerDes or InputFormat/OutputFormat `JAR` files in the 
 `<drill_installation_directory>/jars/3rdparty` folder.
 
-## Hive Remote Metastore
+## Hive Remote Metastore Configuration
 
-In this configuration, the Hive metastore runs as a separate service outside
+The Hive metastore configuration runs as a separate service outside
 of Hive. Drill communicates with the Hive metastore through Thrift. The
 metastore service communicates with the Hive database over JDBC. Point Drill
 to the Hive metastore service address, and provide the connection parameters
-in the Drill Web UI to configure a connection to Drill.
+in a Hive storage plugin configuration to configure a connection to Drill.
 
 {% include startnote.html %}Verify that the Hive metastore service is running before you
register the Hive metastore.{% include endnote.html %}  
 
-To configure a remote Hive metastore, complete the following steps:
+To register a remote Hive metastore with Drill:
 
 1. Issue the following command to start the Hive metastore service on the system specified
in the `hive.metastore.uris`:
    `hive --service metastore`
-2. Navigate to `http://<host>:8047`, and select the **Storage** tab.
-3. In the disabled storage plugins section, click **Update** next to the `hive` instance.
+2. In the [Drill Web UI]({{ site.baseurl }}/docs/plugin-configuration-basics/#using-the-drill-web-ui),
select the **Storage** tab.
+3. In the list of disabled storage plugins in the Drill Web UI, click **Update** next to
the `hive` instance. For example:
 
         {
           "type": "hive",
@@ -35,15 +35,13 @@ To configure a remote Hive metastore, complete the following steps:
             "hive.metastore.sasl.enabled": "false"
           }
         }
-4. In the configuration window, add the `Thrift URI` and port to `hive.metastore.uris`. 
+4. In the configuration window, add the `Thrift URI` and port to `hive.metastore.uris`. For
example:
 
-    **Example**
-     
           ...
              "configProps": {
              "hive.metastore.uris": "thrift://<host>:<port>",
           ...
-5. Change the default location of files to suit your environment, for example, change `"fs.default.name":
"file:///"` to one of these locations:
+5. Change the default location of files to suit your environment; for example, change `"fs.default.name"`
property from `"file:///"` to one of these locations:
    * `hdfs://`
    * `hdfs://<authority>:<port>`
 6. If you are running Drill and Hive in a secure MapR cluster, remove the following line
from the configuration:  
@@ -54,9 +52,9 @@ To configure a remote Hive metastore, complete the following steps:
 
 After configuring a Hive storage plugin, you can [query Hive tables]({{ site.baseurl }}/docs/querying-hive/).
 
-## Hive Embedded Metastore
+## Hive Embedded Metastore Configuration
 
-In this configuration, the Hive metastore is embedded within the Drill process. Configure
an embedded metastore only in a cluster that runs a single Drillbit and only for testing purposes.
Do not embed the Hive metastore in production systems.
+The Hive metastore configuration is embedded within the Drill process. Configure an embedded
metastore only in a cluster that runs a single Drillbit and only for testing purposes. Do
not embed the Hive metastore in production systems.
 
 Provide the metastore database configuration settings in the Drill Web UI. Before you configure
an embedded Hive metastore, verify that the driver you use to connect to the Hive metastore
is in the Drill classpath located in `/<drill installation directory>/lib/.` If the
driver is not there, copy the driver to `/<drill
 installation directory>/lib` on the Drill node. For more information about storage types
and configurations, refer to ["Hive Metastore Administration"](https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin).
@@ -64,7 +62,7 @@ installation directory>/lib` on the Drill node. For more information
about stora
 To configure an embedded Hive metastore, complete the following
 steps:
 
-1. Navigate to `http://<host>:8047`, and select the **Storage** tab.
+1. In the [Drill Web UI]({{ site.baseurl }}/docs/plugin-configuration-basics/#using-the-drill-web-ui),
and select the **Storage** tab.
 2. In the disabled storage plugins section, click **Update** next to `hive` instance.
 3. In the configuration window, add the database configuration settings.
 
@@ -81,6 +79,6 @@ steps:
               "hive.metastore.sasl.enabled": "false"
             }
           }
-5. Change the `"fs.default.name":` attribute to specify the default location of files. The
value needs to be a URI that is available and capable of handling file system requests. For
example, change the local file system URI `"file:///"` to the HDFS URI: `hdfs://`, or to the
path on HDFS with a namenode: `hdfs://<authority>:<port>`
+5. Change the `"fs.default.name"` attribute to specify the default location of files. The
value needs to be a URI that is available and capable of handling file system requests. For
example, change the local file system URI `"file:///"` to the HDFS URI: `hdfs://`, or to the
path on HDFS with a namenode: `hdfs://<authority>:<port>`
 6. Click **Enable**.
   
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/080-drill-default-input-format.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/080-drill-default-input-format.md b/_docs/connect-a-data-source/080-drill-default-input-format.md
index 960512d..fd6b768 100644
--- a/_docs/connect-a-data-source/080-drill-default-input-format.md
+++ b/_docs/connect-a-data-source/080-drill-default-input-format.md
@@ -3,62 +3,47 @@ title: "Drill Default Input Format"
 parent: "Storage Plugin Configuration"
 ---
 You can define a default input format to tell Drill what file type exists in a
-workspace within a file system. Drill determines the file type based on file
-extensions and magic numbers when searching a workspace.
+workspace within a file system. 
 
-Magic numbers are file signatures that Drill uses to identify Parquet files.
-If Drill cannot identify the file type based on file extensions or magic
+Normally, Drill determines the file type based on file
+extensions and *magic numbers* when searching a workspace. Magic numbers are file signatures
that Drill uses to identify Parquet files. If Drill cannot identify the file type based on
file extensions or magic
 numbers, the query fails. Defining a default input format can prevent queries
 from failing in situations where Drill cannot determine the file type.
 
-If you incorrectly define the file type in a workspace and Drill cannot
-determine the file type, the query fails. For example, if JSON files do not have a `.json`
extension, the query fails.
-
-You can define one default input format per workspace. If you do not define a
-default input format, and Drill cannot detect the file format, the query
-fails. You can define a default input format for any of the file types that
-Drill supports. Currently, Drill supports the following types:
+If you do not define the default file type in a workspace or incorrectly define the default
file type, and Drill cannot
+determine the file type without this information, the query fails. You can define one default
input format per workspace. You can define a default input format for any of the file types
that
+Drill supports. Currently, Drill supports the following input types:
 
   * Avro
   * CSV, TSV, or PSV
   * Parquet
   * JSON
-  * MapR-DB*
-
-\* Only available when you install Drill on a cluster using the mapr-drill package.
-
-## Defining a Default Input Format
 
-You define the default input format for a file system workspace through the
-Drill Web UI. You must have a [defined workspace]({{ site.baseurl }}/docs/workspaces) before
you can define a
-default input format.
+You must have a [defined workspace]({{ site.baseurl }}/docs/workspaces) before you can define
a default input format.
 
-To define a default input format for a workspace, complete the following
-steps:
+To define a default input format for a workspace:
 
-  1. Navigate to the Drill Web UI at `<drill_node_ip_address>:8047`. The Drillbit process
must be running on the node before you connect to the Drill Web UI.
+  1. Navigate to the [Drill Web UI]({{ site.baseurl }}/docs/plugin-configuration-basics/#using-the-drill-web-ui).
The Drillbit process must be running on the node before you connect to the Drill Web UI.
   2. Select **Storage** in the toolbar.
-  3. Click **Update** next to the storage plugin for which you want to define a default input
format for a workspace.
+  3. Click **Update** next to the storage plugin configuration for which you want to define
a default input format for a workspace.
   4. In the Configuration area, locate the workspace, and change the `defaultInputFormat`
attribute to any of the supported file types.
 
-     **Example**
-     
-        {
-          "type": "file",
-          "enabled": true,
-          "connection": "hdfs://",
-          "workspaces": {
-            "root": {
-              "location": "/drill/testdata",
-              "writable": false,
-              "defaultInputFormat": csv
-          },
-          "local" : {
-            "location" : "/max/proddata",
-            "writable" : true,
-            "defaultInputFormat" : "json"
-        }
-
-## Querying Compressed Files
-
-You can query compressed GZ files, such as JSON and CSV, as well as uncompressed files. The
file extension specified in the `formats . . . extensions` property of the storage plugin
configuration must precede the gz extension in the file name. For example, `proddata.json.gz`
or `mydata.csv.gz` are valid file names to use in a query, as shown in the example in ["Querying
the GZ File Directly"]({{site.baseurl"}}/docs/querying-plain-text-files/#query-the-gz-file-directly).
+### Example of Defining a Default Input Format
+
+```
+{
+  "type": "file",
+  "enabled": true,
+  "connection": "hdfs://",
+  "workspaces": {
+    "root": {
+      "location": "/drill/testdata",
+      "writable": false,
+      "defaultInputFormat": "csv"
+  },
+  "local" : {
+    "location" : "/max/proddata",
+    "writable" : true,
+    "defaultInputFormat" : "json"
+}
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/connect-a-data-source/090-mongodb-plugin-for-apache-drill.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/090-mongodb-plugin-for-apache-drill.md b/_docs/connect-a-data-source/090-mongodb-plugin-for-apache-drill.md
index afd6ee2..7e439e2 100644
--- a/_docs/connect-a-data-source/090-mongodb-plugin-for-apache-drill.md
+++ b/_docs/connect-a-data-source/090-mongodb-plugin-for-apache-drill.md
@@ -4,18 +4,16 @@ parent: "Connect a Data Source"
 ---
 ## Overview
 
-Drill supports MongoDB 3.0, providing a mongodb format plugin to connect to MongoDB using
MongoDB's latest Java driver. You can run queries
-to read, but not write, the Mongo data using Drill. Attempting to write data back to Mongo
results in an error. You do not need any upfront schema definitions. 
+Drill supports MongoDB 3.0, providing a mongodb storage plugin to connect to MongoDB using
MongoDB's latest Java driver. You can run queries
+to read, but not write, Mongo data using Drill. Attempting to write data back to Mongo results
in an error. You do not need any upfront schema definitions. 
 
-{% include startnote.html %}A local instance of Drill is used in this tutorial for simplicity.
{% include endnote.html %}
+{% include startnote.html %}In the following examples, you use a local instance of Drill
for simplicity. {% include endnote.html %}
 
 You can also run Drill and MongoDB together in distributed mode.
 
 ### Before You Begin
 
-Before you can query MongoDB with Drill, you must have Drill and MongoDB
-installed on your machine. Examples in this tutorial use zip code aggregation data
-provided by MongoDB that you download in the following steps:
+To query MongoDB with Drill, you install Drill and MongoDB, and then you import zip code
aggregation data into MongoDB. 
 
   1. [Install Drill]({{ site.baseurl }}/docs/installing-drill-in-embedded-mode), if you do
not already have it installed.
   2. [Install MongoDB](http://docs.mongodb.org/manual/installation), if you do not already
have it installed.
@@ -23,20 +21,14 @@ provided by MongoDB that you download in the following steps:
 
 ## Configuring MongoDB
 
-Start Drill and configure the MongoDB storage plugin in the Drill Web
-UI to connect to Drill. Drill must be running in order to access the Web UI.
-
-Complete the following steps to configure MongoDB as a data source for Drill:
+Drill must be running in order to access the Web UI to configure a storage plugin configuration.
Start Drill and view and enable the MongoDB storage plugin configuration as described in the
following procedure: 
 
   1. [Start the Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/).
 
      The Drill shell needs to be running to access the Drill Web UI.
-  2. Open a browser window, and navigate to the Drill Web UI at `http://localhost:8047`.
-  3. In the navigation bar, click **Storage**.
-  4. Under Disabled Storage Plugins, select **Update** next to the `mongo` storage plugin.
-  5. In the Configuration window, verify that `"enabled"` is set to ``"true."``
-
-     **Example**
+  2. In the [Drill Web UI]({{ site.baseurl }}/docs/plugin-configuration-basics/#using-the-drill-web-ui),
select the **Storage** tab.
+  4. Under Disabled Storage Plugins, select **Update** to choose the `mongo` storage plugin
configuration.
+  5. In the Configuration window, take a look at the default configuration:
      
         {
           "type": "mongo",
@@ -49,7 +41,7 @@ Complete the following steps to configure MongoDB as a data source for Drill:
 
 ## Querying MongoDB
 
-In the Drill shell, you can issue the `SHOW DATABASES `command to see a list of databases
from all
+In the [Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/), you can
issue the `SHOW DATABASES` command to see a list of schemas from all
 Drill data sources, including MongoDB. If you downloaded the zip codes file,
 you should see `mongo.zipdb` in the results.
 
@@ -66,16 +58,11 @@ you should see `mongo.zipdb` in the results.
     | INFORMATION_SCHEMA |
     +--------------------+
 
-If you want all queries that you submit to run on `mongo.zipdb`, you can issue
+If you want all queries that you submit to default to `mongo.zipdb`, you can issue
 the `USE` command to change schema.
 
 ### Example Queries
 
-The following example queries are included for reference. However, you can use
-the SQL power of Apache Drill directly on MongoDB. For more information about,
-refer to the [SQL
-Reference]({{ site.baseurl }}/docs/sql-reference).
-
 **Example 1: View mongo.zipdb Dataset**
 
     0: jdbc:drill:zk=local> SELECT * FROM zipcodes LIMIT 10;
@@ -147,7 +134,5 @@ Reference]({{ site.baseurl }}/docs/sql-reference).
 
 ## Using ODBC/JDBC Drivers
 
-You can leverage the power of Apache Drill to query MongoDB through standard
-BI tools, such as Tableau and SQuirreL.
-
-For information about Drill ODBC and JDBC drivers, refer to [Drill Interfaces]({{ site.baseurl
}}/docs/odbc-jdbc-interfaces).
+You can query MongoDB through standard
+BI tools, such as Tableau and SQuirreL. For information about Drill ODBC and JDBC drivers,
refer to [Drill Interfaces]({{ site.baseurl }}/docs/odbc-jdbc-interfaces).

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/data-sources-and-file-formats/050-json-data-model.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/050-json-data-model.md b/_docs/data-sources-and-file-formats/050-json-data-model.md
index 75f47b1..59618f8 100644
--- a/_docs/data-sources-and-file-formats/050-json-data-model.md
+++ b/_docs/data-sources-and-file-formats/050-json-data-model.md
@@ -12,7 +12,7 @@ Semi-structured JSON data often consists of complex, nested elements having
sche
 
 Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON
object as a SQL record. One object equals one row in a Drill table.
 
-You can also [query compressed .gz files]({{ site.baseurl }}/docs/drill-default-input-format#querying-compressed-json)
having JSON as well as uncompressed .json files.
+You can also [query compressed .gz files]({{ site.baseurl }}/docs/querying-plain-text-files/#querying-compressed-files)
having JSON as well as uncompressed .json files.
 
 In addition to the examples presented later in this section, see ["How to Analyze Highly
Dynamic Datasets with Apache Drill"](https://www.mapr.com/blog/how-analyze-highly-dynamic-datasets-apache-drill)
for information about how to analyze a JSON data set.
 

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
index c17ac33..07e1e03 100644
--- a/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
+++ b/_docs/query-data/query-a-file-system/030-querying-plain-text-files.md
@@ -196,8 +196,11 @@ times a year in the books that Google scans.
 
 The Drill default storage plugins support common file formats. 
 
+## Querying Compressed Files
 
-## Query the GZ File Directly
+You can query compressed GZ files, such as JSON and CSV, as well as uncompressed files. The
file extension specified in the `formats . . . extensions` property of the storage plugin
configuration must precede the gz extension in the file name. For example, `proddata.json.gz`
or `mydata.csv.gz` are valid file names to use in a query, as shown in the next example.
+
+### Query the GZ File Directly
 
 This example covers how to query the GZ file containing the compressed TSV data. The GZ file
name needs to be renamed to specify the type of delimited file, such as CSV or TSV. You add
`.tsv` before the `.gz` extension in this example.
 
@@ -214,3 +217,5 @@ This example covers how to query the GZ file containing the compressed
TSV data.
 
      The 5 rows of output appear.  
 
+
+

http://git-wip-us.apache.org/repos/asf/drill/blob/1fc4d00c/_docs/tutorials/learn-drill-with-the-mapr-sandbox/005-about-the-mapr-sandbox.md
----------------------------------------------------------------------
diff --git a/_docs/tutorials/learn-drill-with-the-mapr-sandbox/005-about-the-mapr-sandbox.md
b/_docs/tutorials/learn-drill-with-the-mapr-sandbox/005-about-the-mapr-sandbox.md
index c1b2376..01bede8 100644
--- a/_docs/tutorials/learn-drill-with-the-mapr-sandbox/005-about-the-mapr-sandbox.md
+++ b/_docs/tutorials/learn-drill-with-the-mapr-sandbox/005-about-the-mapr-sandbox.md
@@ -2,12 +2,11 @@
 title: "About the MapR Sandbox"
 parent: "Learn Drill with the MapR Sandbox"
 ---
-This tutorial uses the MapR Sandbox, which is a Hadoop environment pre-
-configured with Apache Drill. MapR includes Apache Drill as part of the Hadoop distribution.
The MapR
-Sandbox with Apache Drill is a fully functional single-node cluster that can
-be used to get an overview on Apache Drill in a Hadoop environment. Business
+This tutorial uses the MapR Sandbox, which is a Hadoop environment pre-configured with Drill.
MapR includes Drill as part of the Hadoop distribution. The MapR
+Sandbox with Drill is a fully functional single-node cluster that can
+be used to get an overview of Drill in a Hadoop environment. Business
 and technical analysts, product managers, and developers can use the sandbox
-environment to get a feel for the power and capabilities of Apache Drill by
+environment to get a feel for the power and capabilities of Drill by
 performing various types of queries. 
 
 Hadoop is not a prerequisite for Drill and users can start ramping


Mime
View raw message