drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: delete obsolete parquet metadata caching
Date Mon, 05 Oct 2015 23:06:16 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages eee4fae0e -> 492ed5a29


delete obsolete parquet metadata caching

DFS to dfs, plug names are case-sensitive

MD-396

minor edit

hide web ui stuff

metadata caching rewrite

typo

fixes to TSV quote

1.2 rn

minor edit

fix spacing

Suresh's change

hide rn, typo web sec


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/492ed5a2
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/492ed5a2
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/492ed5a2

Branch: refs/heads/gh-pages
Commit: 492ed5a29e4566af336d3ad1ad22c965bb82f41b
Parents: eee4fae
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Thu Oct 1 16:48:02 2015 -0700
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Mon Oct 5 15:39:55 2015 -0700

----------------------------------------------------------------------
 _data/docs.json                                 | 75 ++++++++++++++++----
 ...-configuring-web-ui-and-rest-api-security.md | 10 +--
 .../120-configuring-the-drill-shell.md          |  2 +-
 _docs/connect-a-data-source/050-workspaces.md   |  4 +-
 .../060-text-files-csv-tsv-psv.md               | 16 +++++
 _docs/install/060-starting-the-web-ui.md        |  4 +-
 .../025-optimizing-parquet-reading.md           | 40 +++++++++++
 7 files changed, 130 insertions(+), 21 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_data/docs.json
----------------------------------------------------------------------
diff --git a/_data/docs.json b/_data/docs.json
index 0a14e87..2887fbb 100644
--- a/_data/docs.json
+++ b/_data/docs.json
@@ -599,8 +599,8 @@
             "next_title": "Query Plans and Tuning", 
             "next_url": "/docs/query-plans-and-tuning/", 
             "parent": "Performance Tuning", 
-            "previous_title": "Partition Pruning", 
-            "previous_url": "/docs/partition-pruning/", 
+            "previous_title": "Optimizing Parquet Reading", 
+            "previous_url": "/docs/optimizing-parquet-reading/", 
             "relative_path": "_docs/performance-tuning/030-choosing-a-storage-format.md",

             "title": "Choosing a Storage Format", 
             "url": "/docs/choosing-a-storage-format/"
@@ -5625,6 +5625,23 @@
             "title": "Operators", 
             "url": "/docs/operators/"
         }, 
+        "Optimizing Parquet Reading": {
+            "breadcrumbs": [
+                {
+                    "title": "Performance Tuning", 
+                    "url": "/docs/performance-tuning/"
+                }
+            ], 
+            "children": [], 
+            "next_title": "Choosing a Storage Format", 
+            "next_url": "/docs/choosing-a-storage-format/", 
+            "parent": "Performance Tuning", 
+            "previous_title": "Partition Pruning", 
+            "previous_url": "/docs/partition-pruning/", 
+            "relative_path": "_docs/performance-tuning/025-optimizing-parquet-reading.md",

+            "title": "Optimizing Parquet Reading", 
+            "url": "/docs/optimizing-parquet-reading/"
+        }, 
         "PARTITION BY Clause": {
             "breadcrumbs": [
                 {
@@ -5671,8 +5688,8 @@
                 }
             ], 
             "children": [], 
-            "next_title": "Choosing a Storage Format", 
-            "next_url": "/docs/choosing-a-storage-format/", 
+            "next_title": "Optimizing Parquet Reading", 
+            "next_url": "/docs/optimizing-parquet-reading/", 
             "parent": "Performance Tuning", 
             "previous_title": "Performance Tuning Introduction", 
             "previous_url": "/docs/performance-tuning-introduction/", 
@@ -5725,8 +5742,8 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Choosing a Storage Format", 
-                    "next_url": "/docs/choosing-a-storage-format/", 
+                    "next_title": "Optimizing Parquet Reading", 
+                    "next_url": "/docs/optimizing-parquet-reading/", 
                     "parent": "Performance Tuning", 
                     "previous_title": "Performance Tuning Introduction", 
                     "previous_url": "/docs/performance-tuning-introduction/", 
@@ -5742,11 +5759,28 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Query Plans and Tuning", 
-                    "next_url": "/docs/query-plans-and-tuning/", 
+                    "next_title": "Choosing a Storage Format", 
+                    "next_url": "/docs/choosing-a-storage-format/", 
                     "parent": "Performance Tuning", 
                     "previous_title": "Partition Pruning", 
                     "previous_url": "/docs/partition-pruning/", 
+                    "relative_path": "_docs/performance-tuning/025-optimizing-parquet-reading.md",

+                    "title": "Optimizing Parquet Reading", 
+                    "url": "/docs/optimizing-parquet-reading/"
+                }, 
+                {
+                    "breadcrumbs": [
+                        {
+                            "title": "Performance Tuning", 
+                            "url": "/docs/performance-tuning/"
+                        }
+                    ], 
+                    "children": [], 
+                    "next_title": "Query Plans and Tuning", 
+                    "next_url": "/docs/query-plans-and-tuning/", 
+                    "parent": "Performance Tuning", 
+                    "previous_title": "Optimizing Parquet Reading", 
+                    "previous_url": "/docs/optimizing-parquet-reading/", 
                     "relative_path": "_docs/performance-tuning/030-choosing-a-storage-format.md",

                     "title": "Choosing a Storage Format", 
                     "url": "/docs/choosing-a-storage-format/"
@@ -13637,8 +13671,8 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Choosing a Storage Format", 
-                    "next_url": "/docs/choosing-a-storage-format/", 
+                    "next_title": "Optimizing Parquet Reading", 
+                    "next_url": "/docs/optimizing-parquet-reading/", 
                     "parent": "Performance Tuning", 
                     "previous_title": "Performance Tuning Introduction", 
                     "previous_url": "/docs/performance-tuning-introduction/", 
@@ -13654,11 +13688,28 @@
                         }
                     ], 
                     "children": [], 
-                    "next_title": "Query Plans and Tuning", 
-                    "next_url": "/docs/query-plans-and-tuning/", 
+                    "next_title": "Choosing a Storage Format", 
+                    "next_url": "/docs/choosing-a-storage-format/", 
                     "parent": "Performance Tuning", 
                     "previous_title": "Partition Pruning", 
                     "previous_url": "/docs/partition-pruning/", 
+                    "relative_path": "_docs/performance-tuning/025-optimizing-parquet-reading.md",

+                    "title": "Optimizing Parquet Reading", 
+                    "url": "/docs/optimizing-parquet-reading/"
+                }, 
+                {
+                    "breadcrumbs": [
+                        {
+                            "title": "Performance Tuning", 
+                            "url": "/docs/performance-tuning/"
+                        }
+                    ], 
+                    "children": [], 
+                    "next_title": "Query Plans and Tuning", 
+                    "next_url": "/docs/query-plans-and-tuning/", 
+                    "parent": "Performance Tuning", 
+                    "previous_title": "Optimizing Parquet Reading", 
+                    "previous_url": "/docs/optimizing-parquet-reading/", 
                     "relative_path": "_docs/performance-tuning/030-choosing-a-storage-format.md",

                     "title": "Choosing a Storage Format", 
                     "url": "/docs/choosing-a-storage-format/"

http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_docs/configure-drill/078-configuring-web-ui-and-rest-api-security.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/078-configuring-web-ui-and-rest-api-security.md b/_docs/configure-drill/078-configuring-web-ui-and-rest-api-security.md
index fcde234..5599c64 100644
--- a/_docs/configure-drill/078-configuring-web-ui-and-rest-api-security.md
+++ b/_docs/configure-drill/078-configuring-web-ui-and-rest-api-security.md
@@ -18,7 +18,7 @@ Drill generates a self-signed certificate that works with SSL for HTTPS
access t
 
 ## Setting Up a Custom SSL Configuration
 
-As cluster administrator, you can set the following SSL configuration parameters at in the
`conf/drill-override.conf` file, as described in the [Java product documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/JSSERefGuide.html#Customization):
+As cluster administrator, you can set the following SSL configuration parameters in the `conf/drill-override.conf`
file, as described in the [Java product documentation](http://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/JSSERefGuide.html#Customization):
 
 * javax.net.ssl.keyStore  
   Path to the application's certificate and private key in the Java keystore file.  
@@ -29,11 +29,12 @@ As cluster administrator, you can set the following SSL configuration
parameters
 * javax.net.ssl.trustStorePassword  
   Password for accessing the trusted keystore file.
 
+<!-- 
 ## Prerequisites for Web Console and REST API Security
 
 You need to perform the following configuration tasks using Web Console and REST API security.
 
 
-* [User Authentication]({{site.baseurl}}/docs/configuring-user-authentication/)  
+* Configure [user authentication]({{site.baseurl}}/docs/configuring-user-authentication/)
 
 * Set up Web Console administrators  
   Optionally, you can set up Web Console administrator-user groups to facilitate management
of multiple Web Console administrators.
 
@@ -62,7 +63,7 @@ The following table and subsections describe the privilege levels for accessing
 | getStatsJSON             | /stats.json                  | GET          | application/json
| Returns Drillbit stats such as ports and max direct memory in json format.             
                                                                                         
                                                                          | ALL          
                                                                                         
      |
 | getStatus                | /status                      | GET          | text/html    
   | Returns Running!                                                                    
                                                                                         
                                                                             | ALL       
                                                                                         
         |
 | getSystemOptionsJSON     | /options.json                | GET          | application/json
| Returns list of options. Each option consists of name-value-type-kind (for example: (boot
system datatype).                                                                        
                                                                        | ALL            
                                                                                         
    |
-| getSystemOptions         | /options                     | GET          | text/html    
   | Returns a HTML table where each row is a form containing the option details that allows
option values to be modified.                                                            
                                                                          | ALL          
                                                                                         
      |
+| getSystemOptions         | /options                     | GET          | text/html    
   | Returns an HTML table where each row is a form containing the option details that allows
option values to be modified.                                                            
                                                                         | ALL           
                                                                                         
     |
 | updateSystemOption       | /option/{optionName}         | POST         | text/html    
   | Updates the options and calls getSystemOptions. So again an option list is displayed.
                                                                                         
                                                                            | ADMIN      
                                                                                         
        |
 | getStoragePluginsJSON    | /storage.json                | GET          | application/json
| Returns a list of storage plugin wrappers each containing name-config (instance of StoragePluginConfig)
and enabled.                                                                             
                                                          | ADMIN                        
                                                                                |
 | getStoragePlugins        | /storage                     | GET          | text/html    
   | Returns an HTML page with two sections: The first section contains a table of rows that
are forms containing the plugin button for the update page and a button to disable the plugin.
The second section is the same except the button enables the plugin. | ADMIN             
                                                                                         
 |
@@ -83,6 +84,7 @@ The following table and subsections describe the privilege levels for accessing
 | submitQuery              | /query                       | POST         | text/html    
   | Returns results from submitQueryJSON populated in a HTML table.                     
                                                                                         
                                                                             | ALL       
                                                                                         
         |
 | getMetrics               | /metrics                     | GET          | text/html    
   | Returns a page that fetches metric info from resource, status, and metrics.         
                                                                                         
                                                                             | ALL       
                                                                                         
         |
 | getThreads               | /threads                     | GET          | text/html    
   | Returns a page that fetches metric information from resource, status, and threads.  
                                                                                         
                                                                             | ALL       
                                                                                         
         |
+
 ### GET /profiles.json
 
 * ADMIN - gets all profiles on the system.  
@@ -106,4 +108,4 @@ The following table and subsections describe the privilege levels for
accessing
 ### GET /profiles/cancel/{queryid}
 
 * ADMIN - can cancel the query.  
-* USER - cancel the query only if the query is launched by the user requesting the cancellation.
\ No newline at end of file
+* USER - cancel the query only if the query is launched by the user requesting the cancellation.
-->
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_docs/configure-drill/120-configuring-the-drill-shell.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/120-configuring-the-drill-shell.md b/_docs/configure-drill/120-configuring-the-drill-shell.md
index 11b9f67..935325b 100644
--- a/_docs/configure-drill/120-configuring-the-drill-shell.md
+++ b/_docs/configure-drill/120-configuring-the-drill-shell.md
@@ -34,7 +34,7 @@ The following table lists the commands that you can run on the Drill command
lin
 
 ## Example of Hiding the Password When Starting Drill
 
-When starting Drill in authentication mode, you can use the **!connect** command as shown
in the section, ["User Authentication Process"]({{site.baseurl}}/docs/configuring-user-authentication/#user-authentication-process),
instead of the **sqlline**, **drill-embedded**, or **drill-distributed** commands. For example,
after running the sqlline script, you enter this command to connect to Drill:
+When starting Drill in authentication mode, you can use the **!connect** command as shown
in the section, ["User Authentication Process"]({{site.baseurl}}/docs/configuring-user-authentication/#user-authentication-process),
instead of using a command such as **sqlline**, **drill-embedded**, or **drill-conf** commands.
For example, after running the sqlline script, you enter this command to connect to Drill:
 
 `sqlline> !connect jdbc:drill:zk=localhost:2181`  
 

http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_docs/connect-a-data-source/050-workspaces.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/050-workspaces.md b/_docs/connect-a-data-source/050-workspaces.md
index 5989b87..080553d 100755
--- a/_docs/connect-a-data-source/050-workspaces.md
+++ b/_docs/connect-a-data-source/050-workspaces.md
@@ -5,7 +5,7 @@ parent: "Storage Plugin Configuration"
 You can define one or more workspaces in a [storage plugin configuration]({{site.baseurl}}/docs/plugin-configuration-basics/).
The workspace defines the location of files in subdirectories of a local or distributed file
system. Drill searches the workspace to locate data when
 you run a query. A hidden default workspace, `dfs.default`, points to the root of the file
system.
 
-The following DFS storage plugin configuration shows some examples of defined workspaces:
+The following `dfs` storage plugin configuration shows some examples of defined workspaces:
 
        {
          "type": "file",
@@ -54,7 +54,7 @@ location of the data:
 ##Overriding `dfs.default`
 
 You may want to override the hidden default workspace in scenarios where users do not have
permissions to access the root directory. 
-Add the following workspace entry to the DFS storage plugin configuration to override the
default workspace:
+Add the following workspace entry to the `dfs` storage plugin configuration to override the
default workspace:
 
     "default": {
       "location": "</directory/path>",

http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md b/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
index 678c270..03a5931 100644
--- a/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
+++ b/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
@@ -39,6 +39,22 @@ In the storage plugin configuration, you [set the attributes]({{site.baseurl}}/d
 
 Set the `sys.options` property setting `exec.storage.enable_new_text_reader` to true (the
default) before attempting to use these attributes. 
 
+### Using Quotation Marks
+CSV files typically enclose text fields in double quotation marks, and Drill treats the double
quotation mark in CSV files as a special character accordingly. By default, Drill treats double
quotation marks as a special character in TSV files also. If you want Drill *not* to treat
double quotation marks as a special character, configure the storage plugin to set the `quote`
attribute to the unicode null `"\u0000"`. For example:
+
+       . . .
+       "tsv": {
+       "type": "text",
+       "extensions": [
+         "tsv"
+       ],
+       "quote": "\u0000",    <-- set this to null 
+       "delimiter": "\t"
+     },
+     . . .
+
+As mentioned previously, set the `sys.options` property setting `exec.storage.enable_new_text_reader`
to true (the default).
+
 ## Examples of Querying Text Files
 The examples in this section show the results of querying CSV files that use and do not use
a header, include comments, and use an escape character:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_docs/install/060-starting-the-web-ui.md
----------------------------------------------------------------------
diff --git a/_docs/install/060-starting-the-web-ui.md b/_docs/install/060-starting-the-web-ui.md
index df21bc6..deae089 100644
--- a/_docs/install/060-starting-the-web-ui.md
+++ b/_docs/install/060-starting-the-web-ui.md
@@ -13,7 +13,7 @@ In Drill 1.1 and earlier, to open the Drill Web Console, launch a web browser,
a
 
 where IP address is the host name or IP address of one of the installed Drillbits in a distributed
system or `localhost` in an embedded system.
 
-## Drill 1.2 and Later
+<!-- ## Drill 1.2 and Later
 
 In Drill 1.2 and later, to open the Drill Web Console, launch a web browser, and go to one
of the following URLs depending on the configuration of HTTPS support:
 
@@ -36,6 +36,6 @@ If an [administrator]({{ site.baseurl }}/docs/configuring-user-authentication/#a
 
 If a user, who is not an administrator, logs in, the Web Console controls are limited to
Query, Metrics, and Profiles. The Profiles tab for a non-administrator user contains the profiles
of all queries the user issued either through ODBC, JDBC, or the Web Console. 
 
-![Web Console User View]({{ site.baseurl }}/docs/img/web-ui-user-view.png)
+![Web Console User View]({{ site.baseurl }}/docs/img/web-ui-user-view.png) -->
 
 

http://git-wip-us.apache.org/repos/asf/drill/blob/492ed5a2/_docs/performance-tuning/025-optimizing-parquet-reading.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/025-optimizing-parquet-reading.md b/_docs/performance-tuning/025-optimizing-parquet-reading.md
new file mode 100644
index 0000000..471af37
--- /dev/null
+++ b/_docs/performance-tuning/025-optimizing-parquet-reading.md
@@ -0,0 +1,40 @@
+---
+title: "Optimizing Parquet Reading"
+parent: "Performance Tuning"
+---
+
+Parquet metadata caching is an optional feature in Drill 1.2 and later. When you use this
feature, Drill generates a metadata cache file. Drill stores the metadata cache file  in a
directory you specify and its subdirectories. When you run a query on this directory or a
subdirectory, Drill reads the metadata cache file instead of the actual files that contain
the metadata during the query-planning phase. You can realize performance improvements during
the query-planning phase when Drill reads just one metadata file instead of reading multiple
files to fetch the metadata.
+
+The Parquet metadata caching feature enables Drill to speed up the cost of query planning.
The actual query runtime will not be improved if the planning cost is not a bottleneck.
+
+## When to Use Parquet Metadata Caching
+
+Use Parquet metadata caching to optimize reads only when the planning phase of a query takes
longer than the execution phase. Before using this feature, run your query and compare the
time for executing the logical (planning) and physical (execution) operations to see if using
this feature makes sense. The logical/planning operations must take longer than the physical/execution
operations; otherwise, do not create Parquet metadata because Drill cannot optimize reads
under these conditions.
+
+## How to Trigger Generation of the Parquet Metadata Cache File
+
+The following command generates the Parquet metadata cache file in the `<path to table>`
and its subdirectories.
+
+`REFRESH TABLE METADATA <path to table>`
+
+You need to run this command on a directory, nested or flat, only once during the session.
Only the first query gathers the metadata unless the Parquet data changes, for example, you
delete some data. If you did not make changes to the Parquet data, subsequent queries encounter
the up-to-date Parquet metadata files. There is no need for Drill to regenerate the metadata.
If there are changes, the metadata needs updating, so Drill regenerates the Parquet metadata
when you issue the next query.
+
+The elapsed time of any queries that trigger regeneration of data can be greater than that
of other queries.
+
+## Example of Generating Parquet Metadata
+
+```
+0: jdbc:drill:schema=dfs> REFRESH TABLE METADATA t1;
++-------+----------------------------------------------+
+|  ok   |                   summary                    |
++-------+----------------------------------------------+
+| true  | Successfully updated metadata for table t1.  |
++-------+----------------------------------------------+
+1 row selected (0.445 seconds)
+```
+
+## How Drill Generates and Uses Parquet Metadata
+
+After running the REFRESH TABLE METADATA command, Drill traverses directories in the case
of nested directories to find the Parquet files. From the footers of the files, Drill gathers
metadata, such as row counts and node affinity based on HDFS block locations. For each directory
level, Drill saves a summary of the information from the footers in a single Parquet metadata
cache file. The summary at each level covers that particular level and all lower levels; consequently,
after generating metadata, you can query nested directories from any level. For example, you
can query a subdirectory of Parquet files because Drill stores a Parquet metadata cache file
at each level.
+
+At planning time, Drill reads only the metadata file. Parquet metadata caching has no effect
on execution time. At execution time, Drill reads the actual files.
\ No newline at end of file


Mime
View raw message