drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [22/31] drill git commit: resolve merge conflict
Date Wed, 25 Nov 2015 22:03:10 GMT
resolve merge conflict


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/375cdec9
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/375cdec9
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/375cdec9

Branch: refs/heads/gh-pages
Commit: 375cdec96397033cd4671f24f34bf4841d8c3e12
Parents: 2012263
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Wed Nov 25 10:35:02 2015 -0800
Committer: Kristine Hahn <khahn@maprtech.com>
Committed: Wed Nov 25 10:35:02 2015 -0800

----------------------------------------------------------------------
 .../035-plugin-configuration-basics.md          |  8 ++--
 .../080-drill-default-input-format.md           |  1 +
 ...ata-sources-and-file-formats-introduction.md |  1 +
 .../070-sequencefile-format.md                  | 44 ++++++++++++++++++++
 .../050-querying-sequence-files.md              | 33 +++++++++++++++
 5 files changed, 83 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/375cdec9/_docs/connect-a-data-source/035-plugin-configuration-basics.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/035-plugin-configuration-basics.md b/_docs/connect-a-data-source/035-plugin-configuration-basics.md
index 0c005db..fe77cb5 100644
--- a/_docs/connect-a-data-source/035-plugin-configuration-basics.md
+++ b/_docs/connect-a-data-source/035-plugin-configuration-basics.md
@@ -13,7 +13,7 @@ You can use the Drill Web Console to update or add a new storage plugin
configur
 
 To create a name and new configuration:
 
-1. [Start the Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/).
 
+1. [Start the Drill shell]({{site.baseurl}}/docs/starting-drill-on-linux-and-mac-os-x/).
 2. [Start the Web Console]({{site.baseurl}}/docs/starting-the-web-console/).  
 3. On the Storage tab, enter a name in **New Storage Plugin**.
    Each configuration registered with Drill must have a distinct
@@ -84,13 +84,13 @@ The following table describes the attributes you configure for storage
plugins i
   </tr>
   <tr>
     <td>"formats"</td>
-    <td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"avro"<br>"maprdb"
*</td>
-    <td>yes if type is file</td>
+    <td>"psv"<br>"csv"<br>"tsv"<br>"parquet"<br>"json"<br>"avro"<br>"maprdb"<br>"sequencefile"
*</td>
+    <td>yes</td>
     <td>One or more valid file formats for reading. Drill implicitly detects formats
of some files based on extension or bits of data in the file; others require configuration.</td>
   </tr>
   <tr>
     <td>"formats" . . . "type"</td>
-    <td>"text"<br>"parquet"<br>"json"<br>"maprdb" *</td>
+    <td>"text"<br>"parquet"<br>"json"<br>"maprdb"<br>"avro"<br>"sequencefile"
*</td>
     <td>yes</td>
     <td>Format type. You can define two formats, csv and psv, as type "Text", but having
different delimiters. </td>
   </tr>

http://git-wip-us.apache.org/repos/asf/drill/blob/375cdec9/_docs/connect-a-data-source/080-drill-default-input-format.md
----------------------------------------------------------------------
diff --git a/_docs/connect-a-data-source/080-drill-default-input-format.md b/_docs/connect-a-data-source/080-drill-default-input-format.md
index 2de4d70..4b7ae46 100644
--- a/_docs/connect-a-data-source/080-drill-default-input-format.md
+++ b/_docs/connect-a-data-source/080-drill-default-input-format.md
@@ -18,6 +18,7 @@ Drill supports. Currently, Drill supports the following input types:
   * CSV, TSV, or PSV
   * Parquet
   * JSON
+  * Hadoop Sequence Files
 
 You must have a [defined workspace]({{ site.baseurl }}/docs/workspaces) before you can define
a default input format.
 

http://git-wip-us.apache.org/repos/asf/drill/blob/375cdec9/_docs/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
b/_docs/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
index d468c40..5ecd859 100644
--- a/_docs/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
+++ b/_docs/data-sources-and-file-formats/010-data-sources-and-file-formats-introduction.md
@@ -17,6 +17,7 @@ Drill supports the following input formats for data:
 * PSV (Pipe-Separated-Values)
 * Parquet
 * MapR-DB*
+* Hadoop Sequence Files
 
 \* Only available when you install Drill on a cluster using the mapr-drill package.
 

http://git-wip-us.apache.org/repos/asf/drill/blob/375cdec9/_docs/data-sources-and-file-formats/070-sequencefile-format.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/070-sequencefile-format.md b/_docs/data-sources-and-file-formats/070-sequencefile-format.md
new file mode 100644
index 0000000..2e15d17
--- /dev/null
+++ b/_docs/data-sources-and-file-formats/070-sequencefile-format.md
@@ -0,0 +1,44 @@
+---
+title: "Sequence Files"
+parent: "Data Sources and File Formats"
+---
+
+Hadoop Sequence files (https://wiki.apache.org/hadoop/SequenceFile) are flat files storing
binary key, value pairs.
+Drill projects sequence files as table with two columns - 'binary_key', 'binary_value' of
type VARBINARY.
+
+
+### Storage plugin format for sequence files.
+
+    . . .
+    "sequencefile": {
+      "type": "sequencefile",
+      "extensions": [
+        "seq"
+      ]
+    },
+    . . .
+
+### Querying sequence file.
+
+    SELECT *
+    FROM dfs.tmp.`simple.seq`
+    LIMIT 1;
+    +--------------+---------------+
+    |  binary_key  | binary_value  |
+    +--------------+---------------+
+    | [B@70828f46  | [B@b8c765f    |
+    +--------------+---------------+
+
+
+simple.seq contains byte serialized strings as keys and values, we can convert them to strings.
+
+
+    SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8')
+    FROM dfs.tmp.`simple.seq`
+    LIMIT 1
+    ;
+    +-----------+-------------+
+    |  EXPR$0   |   EXPR$1    |
+    +-----------+-------------+
+    | key0      |   value0    |
+    +-----------+-------------+
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/375cdec9/_docs/query-data/query-a-file-system/050-querying-sequence-files.md
----------------------------------------------------------------------
diff --git a/_docs/query-data/query-a-file-system/050-querying-sequence-files.md b/_docs/query-data/query-a-file-system/050-querying-sequence-files.md
new file mode 100644
index 0000000..ceb77e2
--- /dev/null
+++ b/_docs/query-data/query-a-file-system/050-querying-sequence-files.md
@@ -0,0 +1,33 @@
+---
+title: "Querying Sequence Files"
+parent: "Querying a File System"
+---
+
+Sequence files are flat files storing binary key value pairs.
+Drill projects sequence files as table with two columns 'binary_key', 'binary_value'.
+
+
+### Querying sequence file.
+
+Start drill shell
+
+        SELECT *
+        FROM dfs.tmp.`simple.seq`
+        LIMIT 1;
+        +--------------+---------------+
+        |  binary_key  | binary_value  |
+        +--------------+---------------+
+        | [B@70828f46  | [B@b8c765f    |
+        +--------------+---------------+
+
+Since simple.seq contains byte serialized strings as keys and values, we can convert them
to strings.
+
+        SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8')
+        FROM dfs.tmp.`simple.seq`
+        LIMIT 1
+        ;
+        +-----------+-------------+
+        |  EXPR$0   |   EXPR$1    |
+        +-----------+-------------+
+        | key0      |   value0    |
+        +-----------+-------------+


Mime
View raw message