Return-Path: X-Original-To: apmail-drill-commits-archive@www.apache.org Delivered-To: apmail-drill-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A022418A1D for ; Fri, 17 Jul 2015 01:22:58 +0000 (UTC) Received: (qmail 23350 invoked by uid 500); 17 Jul 2015 01:22:58 -0000 Delivered-To: apmail-drill-commits-archive@drill.apache.org Received: (qmail 23308 invoked by uid 500); 17 Jul 2015 01:22:58 -0000 Mailing-List: contact commits-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: commits@drill.apache.org Delivered-To: mailing list commits@drill.apache.org Received: (qmail 23299 invoked by uid 99); 17 Jul 2015 01:22:58 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2015 01:22:58 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 707B2E3CA8; Fri, 17 Jul 2015 01:22:58 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: bridgetb@apache.org To: commits@drill.apache.org Date: Fri, 17 Jul 2015 01:22:58 -0000 Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: [1/2] drill-site git commit: Drill edits for 1.1 Repository: drill-site Updated Branches: refs/heads/asf-site 94a79d4ed -> d0ba86f3b http://git-wip-us.apache.org/repos/asf/drill-site/blob/d0ba86f3/docs/text-files-csv-tsv-psv/index.html ---------------------------------------------------------------------- diff --git a/docs/text-files-csv-tsv-psv/index.html b/docs/text-files-csv-tsv-psv/index.html index d7dc8c3..4c411e1 100644 --- a/docs/text-files-csv-tsv-psv/index.html +++ b/docs/text-files-csv-tsv-psv/index.html @@ -1030,72 +1030,18 @@ VARCHARS, rather than individual columns. While parquet supports and Drill reads

Configuring Drill to Read Text Files

-

In the storage plugin configuration, you can set the following attributes that affect how Drill reads CSV, TSV, PSV (comma-, tab-, pipe-separated) files.

+

In the storage plugin configuration, you set the attributes that affect how Drill reads CSV, TSV, PSV (comma-, tab-, pipe-separated) files:

    -
  • String lineDelimiter = "\n";
    -One or more characters used to denote a new record. Allows reading files with windows line endings.
  • -
  • char fieldDelimiter = ',';
    -A single character used to separate each value.
  • -
  • char quote = '"';
    -A single character used to start/end a value enclosed in quotation marks.
  • -
  • char escape = '"';
    -A single character used to escape a quototation mark inside of a value.
  • -
  • char comment = '#';
    -A single character used to denote a comment line.
  • -
  • boolean skipFirstLine = false;
    -Set to true to avoid reading headers as data.
  • +
  • comment
  • +
  • escape
  • +
  • deliimiter
  • +
  • quote
  • +
  • skipFirstLine
-

For more information about storage plugin configuration, see "List of Attributes and Definitions".

+

Set the sys.options property setting exec.storage.enable_new_text_reader to true (the default) before attempting to use these attributes.

-

You can deal with a mix of text files with and without headers either by creating two separate format plugins or by creating two format plugins within the same storage plugin. The former approach is typically easier than the latter.

- -

Creating Two Separate Format Plugins

- -

Format plugins are associated with a particular storage plugin. Storage plugins define a root directory that Drill targets when using the storage plugin. You can define separate storage plugins for different root directories, and define each of the format attributes to match the files stored below that directory. All files can use the .csv extension, as shown in the following example:

- -

Storage Plugin A

-
"csv": {
-  "type": "text",
-  "extensions": [
-    "csv"
-  ],
-  "delimiter": ","
-},
-. . .
-
-

Storage Plugin B

-
"csv": {
-  "type": "text",
-  "extensions": [
-    "csv"
-  ],
-  "comment": "&",
-  "skipFirstLine": true,
-  "delimiter": ","
-},
-
-

Creating Two Format Plugins within the Same Storage Plugin

- -

Give a different extension to files with a header and to files without a header, and use a storage plugin that looks something like the following example. This method requires renaming some files to use the csv2 extension, as shown in the following example:

-
"csv": {
-  "type": "text",
-  "extensions": [
-    "csv"
-  ],
-  "delimiter": ","
-},
-"csv_with_header": {
-  "type": "text",
-  "extensions": [
-    "csv2"
-  ],
-  "comment": "&",
-  "skipFirstLine": true,
-  "delimiter": ","
-},
-

Examples of Querying Text Files

The examples in this section show the results of querying CSV files that use and do not use a header, include comments, and use an escape character:

@@ -1167,6 +1113,57 @@ Set to true to avoid reading headers as data. +------------------------+ 7 rows selected (0.111 seconds) +

Strategies for Using Attributes

+ +

The attributes, such as skipFirstLine, apply to all workspaces defined in a storage plugin. A typical use case defines separate storage plugins for different root directories to query the files stored below the directory. An alternative use case defines multiple formats within the same storage plugin and names target files using different extensions to match the formats.

+ +

You can deal with a mix of text files with and without headers either by creating two separate format plugins or by creating two format plugins within the same storage plugin. The former approach is typically easier than the latter.

+ +

Creating Two Separate Storage Plugin Configurations

+ +

A storage plugin configuration defines a root directory that Drill targets. You can use a different configuration for each root directory that sets attributes to match the files stored below that directory. All files can use the same extension, such as .csv, as shown in the following example:

+ +

Storage Plugin A

+
"csv": {
+  "type": "text",
+  "extensions": [
+    "csv"
+  ],
+  "delimiter": ","
+},
+. . .
+
+

Storage Plugin B

+
"csv": {
+  "type": "text",
+  "extensions": [
+    "csv"
+  ],
+  "comment": "&",
+  "skipFirstLine": true,
+  "delimiter": ","
+},
+
+

Creating One Storage Plugin Configuration to Handle Multiple Formats

+ +

You can use a different extension for files with and without a header, and use a storage plugin that looks something like the following example. This method requires renaming some files to use the csv2 extension.

+
"csv": {
+  "type": "text",
+  "extensions": [
+    "csv"
+  ],
+  "delimiter": ","
+},
+"csv_with_header": {
+  "type": "text",
+  "extensions": [
+    "csv2"
+  ],
+  "comment": "&",
+  "skipFirstLine": true,
+  "delimiter": ","
+},
+
http://git-wip-us.apache.org/repos/asf/drill-site/blob/d0ba86f3/docs/workspaces/index.html ---------------------------------------------------------------------- diff --git a/docs/workspaces/index.html b/docs/workspaces/index.html index 96e001f..b1bfe78 100644 --- a/docs/workspaces/index.html +++ b/docs/workspaces/index.html @@ -1003,33 +1003,31 @@
-

When you register an instance of a file system data source, you can configure -one or more workspaces for the instance. The workspace defines the directory location of files in a local or distributed file system. Drill searches the workspace to locate data when +

You can define one or more workspaces in a storage plugin configuration. The workspace defines the directory location of files in a local or distributed file system. Drill searches the workspace to locate data when you run a query. The default workspace points to the root of the file system.

-

Configuring workspaces in the storage plugin definition to include the file location simplifies the query, which is important when querying the same data source repeatedly. After you configure a long path name in the workspaces location property, instead of +

Configuring workspaces to include a file location simplifies the query, which is important when querying the same data source repeatedly. After you configure a long path name in the workspaces location property, instead of using the full path to the data source, you use dot notation in the FROM clause.

<workspaces>.`<location>`

-

To query the data source while you are not connected to -that storage plugin, include the plugin name. This syntax assumes you did not issue a USE statement to connect to a storage plugin that defines the +

To query the data source while you are not using that storage plugin, include the plugin name. This syntax assumes you did not issue a USE statement to connect to a storage plugin that defines the location of the data:

<plugin>.<workspaces>.`<location>`

No Workspaces for Hive and HBase

-

You cannot create workspaces for -hive and hbase storage plugins, though Hive databases show up as workspaces in +

You cannot configure workspaces for +hive and hbase, though Hive databases show up as workspaces in Drill. Each hive instance includes a default workspace that points to the Hive metastore. When you query files and tables in the hive default workspaces, you can omit the workspace name from the query.

For example, you can issue a query on a Hive table in the default workspace -using either of the following formats and get the same results:

+using either of the following queries and get the same results:

Example

SELECT * FROM hive.customers LIMIT 10;
@@ -1040,13 +1038,10 @@ SELECT * FROM hive.`default`.customers LIMIT 10;
   

Default is a reserved word. You must enclose reserved words in back ticks.

-

Because HBase instances do not have workspaces, you can use the following -format to query a table in HBase:

+

Because the HBase storage plugin configuration does not have a workspace, you can use the following +query:

SELECT * FROM hbase.customers LIMIT 10;
 
-

After you register a data source as a storage plugin instance with Drill, and -optionally configure workspaces, you can query the data source.

-
http://git-wip-us.apache.org/repos/asf/drill-site/blob/d0ba86f3/feed.xml ---------------------------------------------------------------------- diff --git a/feed.xml b/feed.xml index 3e6c72f..8e3f4ea 100644 --- a/feed.xml +++ b/feed.xml @@ -6,8 +6,8 @@ / - Tue, 07 Jul 2015 18:15:20 -0700 - Tue, 07 Jul 2015 18:15:20 -0700 + Thu, 16 Jul 2015 18:16:54 -0700 + Thu, 16 Jul 2015 18:16:54 -0700 Jekyll v2.5.2