drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: update per DRILL-4515
Date Mon, 21 Mar 2016 19:18:22 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 93accd167 -> b6278490a


update per DRILL-4515


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/b6278490
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/b6278490
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/b6278490

Branch: refs/heads/gh-pages
Commit: b6278490a5774dd9b701146cf0088f984106a6fe
Parents: 93accd1
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Mon Mar 21 12:16:13 2016 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Mon Mar 21 12:16:13 2016 -0700

----------------------------------------------------------------------
 .../060-text-files-csv-tsv-psv.md                         | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/b6278490/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md b/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
index 07a65d2..e41a245 100644
--- a/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
+++ b/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
@@ -1,6 +1,6 @@
 ---
 title: "Text Files: CSV, TSV, PSV"
-date:  
+date: 2016-03-21 19:16:15 UTC
 parent: "Data Sources and File Formats"
 ---
 
@@ -12,15 +12,15 @@ Best practices for reading text files are:
 
 ### Select Data from Particular Columns
 
-Converting text files to another format, such as Parquet, using the CTAS command and a SELECT
* statement is not recommended. Instead, you should select data from particular columns. If
your text file have no headers, use the [COLUMN[n] syntax]({{site.baseurl}}/docs/querying-plain-text-files),
and then assign meaningful column names using aliases. For example:
+Converting text files to another format, such as Parquet, using the CTAS command and a SELECT
* statement is not recommended. Instead, you should select data from particular columns. If
your text files have no headers, use the [COLUMN[n] syntax]({{site.baseurl}}/docs/querying-plain-text-files),
and then assign meaningful column names using aliases. For example:
 
     CREATE TABLE parquet_users AS SELECT CAST(COLUMNS[0] AS INT) AS user_id,
     COLUMNS[1] AS username, CAST(COLUMNS[2] AS TIMESTAMP) AS registration_date
     FROM `users.csv1`;
 
-You need to select particular columns instead of using SELECT * for performance reasons.
Drill reads CSV, TSV, and PSV files into a list of VARCHARS, rather than individual columns.
While parquet supports and Drill reads lists, as of this release of Drill, the read path for
complex data is not optimized. 
+You need to select particular columns instead of using SELECT * for performance reasons.
Drill reads CSV, TSV, and PSV files into a list of VARCHARS, rather than individual columns.

 
-If your text file have headers, you can enable extractHeader and select particular columns
by name. For example:
+If your text files have headers, you can enable extractHeader and select particular columns
by name. For example:
 
     CREATE TABLE parquet_users AS SELECT CAST(user_id AS INT) AS user_id,
     username, CAST(registration_date AS TIMESTAMP) AS registration_date
@@ -45,7 +45,7 @@ Text files that include empty strings might produce unacceptable results.
Common
 
 
 ### Use a Distributed File System
-Using a distributed file system, such as HDFS, instead of a local file system to query the
files also improves performance because currently Drill does not split files on block splits.
+Using a distributed file system, such as HDFS, instead of a local file system to query files
improves performance because Drill attempts to split files on block boundaries. i
 
 ## Configuring Drill to Read Text Files
 In the storage plugin configuration, you [set the attributes]({{site.baseurl}}/docs/plugin-configuration-basics/#list-of-attributes-and-definitions)
that affect how Drill reads CSV, TSV, PSV (comma-, tab-, pipe-separated) files:  


Mime
View raw message