drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill-site git commit: edits
Date Fri, 02 Oct 2015 00:07:45 GMT
Repository: drill-site
Updated Branches:
  refs/heads/asf-site c3316159f -> 15acf365b


Project: http://git-wip-us.apache.org/repos/asf/drill-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill-site/commit/15acf365
Tree: http://git-wip-us.apache.org/repos/asf/drill-site/tree/15acf365
Diff: http://git-wip-us.apache.org/repos/asf/drill-site/diff/15acf365

Branch: refs/heads/asf-site
Commit: 15acf365b9ffdb4d31c75edeb3339514db310656
Parents: c331615
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Thu Oct 1 17:07:35 2015 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Thu Oct 1 17:07:35 2015 -0700

 docs/drill-introduction/index.html     | 26 ++++++++++----------------
 docs/parquet-format/index.html         | 20 --------------------
 docs/querying-parquet-files/index.html | 21 +--------------------
 docs/sql-extensions/index.html         | 10 +---------
 feed.xml                               |  4 ++--
 5 files changed, 14 insertions(+), 67 deletions(-)

diff --git a/docs/drill-introduction/index.html b/docs/drill-introduction/index.html
index 8233245..a411151 100644
--- a/docs/drill-introduction/index.html
+++ b/docs/drill-introduction/index.html
@@ -994,24 +994,18 @@ applications, while still providing the familiarity and ecosystem of
 the industry-standard query language. Drill provides plug-and-play integration
 with existing Apache Hive and Apache HBase deployments. </p>
-<h2 id="what&#39;s-new-in-apache-drill-1.2">What&#39;s New in Apache Drill
+<!-- ## What's New in Apache Drill 1.2
-<p>This release of Drill fixes <a href="">many issues</a> and introduces
a number of enhancements, including the following ones:</p>
+This release of Drill fixes [many issues]() and introduces a number of enhancements, including
the following ones:
-<li>A number of new <a href="/docs/sql-window-functions">SQL window functions</a><br>
-<li><a href="/docs/ranking-window-functions/#ntile">NTILE</a><br></li>
-<li><a href="/docs/value-window-functions/#lag-lead">LEAD and LEAD</a><br></li>
-<li><a href="/docs/value-window-functions/#first_value-last_value">FIRST_VALUE
and LAST_VALUE</a><br></li>
-<li><a href="/docs/configuring-web-console-and-rest-api-security/">Security</a>
for Web Console and REST API operations<br></li>
-<li>Performance improvements for <a href="/docs/querying-hbase/#querying-big-endian-encoded-data">querying
HBase</a>, which includes leveraging <a href="/docs/querying-hbase/#leveraging-hbase-ordered-byte-encoding">ordered
byte encoding</a>.</li>
-<li>Parquet metadata caching for performantly reading large numbers of Parquet files</li>
-<li><a href="/docs/querying-hive/#optimizing-reads-of-parquet-backed-tables">Optimized
reads</a> of Parquet-backed, Hive tables</li>
-<li>Read support for the <a href="/docs/parquet-format/#about-int96-support">Parquet
INT96 type</a> and a new TIMESTAMP_IMPALA type used with the <a href="/docs/supported-data-types/#data-types-for-convert_to-and-convert_from-functions">CONVERT_FROM</a>
function decodes a timestamp from Hive or Impala.<br></li>
+* A number of new [SQL window functions](/docs/sql-window-functions)  
+  * [NTILE](/docs/ranking-window-functions/#ntile)  
+  * [LEAD and LEAD](/docs/value-window-functions/#lag-lead)  
+  * [FIRST_VALUE and LAST_VALUE](/docs/value-window-functions/#first_value-last_value)  
+* [Security](/docs/configuring-web-console-and-rest-api-security/) for Web Console and REST
API operations  
+* Performance improvements for [querying HBase](/docs/querying-hbase/#querying-big-endian-encoded-data),
which includes leveraging [ordered byte encoding](/docs/querying-hbase/#leveraging-hbase-ordered-byte-encoding)
+* [Optimized reads](/docs/querying-hive/#optimizing-reads-of-parquet-backed-tables) of Parquet-backed,
Hive tables  
+* Read support for the [Parquet INT96 type](/docs/parquet-format/#about-int96-support) and
a new TIMESTAMP_IMPALA type used with the [CONVERT_FROM](/docs/supported-data-types/#data-types-for-convert_to-and-convert_from-functions)
function decodes a timestamp from Hive or Impala.   -->
 <h2 id="what&#39;s-new-in-apache-drill-1.1">What&#39;s New in Apache Drill

diff --git a/docs/parquet-format/index.html b/docs/parquet-format/index.html
index 662f928..bb0e11e 100644
--- a/docs/parquet-format/index.html
+++ b/docs/parquet-format/index.html
@@ -1010,26 +1010,6 @@
 <p>When a read of Parquet data occurs, Drill loads only the necessary columns of data,
which reduces I/O. Reading only a small piece of the Parquet data from a data file or table,
Drill can examine and analyze all values for a column across multiple files. You can create
a Drill table from one format and store the data in another format, including Parquet.</p>
-<h2 id="caching-metadata">Caching Metadata</h2>
-<p>For performant querying of a large number of files, Drill 1.2 and later can take
advantage of metadata, such as the Hive metadata store, and includes the capability of generating
a metadata cache for performant querying of thousands of Parquet files. The metadata cache
is not a central caching system, but simply one or more files of metadata. Drill generates
and saves a cache of metadata in each directory in nested directories. You trigger the generation
of metadata caches by running the REFRESH TABLE METADATA command, as described in <a href="/docs/querying-parquet-files/">Querying
Parquet Files</a>.</p>
-<p>After generating the metadata cache, Drill performs the following tasks during the
planning phase for a query on a directory of Parquet files:</p>
-<li>Finds files.<br></li>
-<li>Recurses directories.<br></li>
-<li>Reads the footers of files to get information, such as row counts and HDFS block
locations for every file for Drill to assign work based on locality.<br>
-When Drill reads the file, it attempts to execute the query on the node where the data rests.<br></li>
-<li>Summarizes the information from the footers in a single metadata cache file.<br></li>
-<li>Stores the metadata cache file at each level that covers that particular level
and all lower levels.</li>
-<p>At execution time, Drill reads the actual files. At planning time, Drill reads only
the metadata file. </p>
-<p>The first query that does not see the metadata file will gather the metadata, so
the elapsed time of the first query will be very different from a subsequent 
-query. </p>
 <h2 id="writing-parquet-files">Writing Parquet Files</h2>
 <p>CREATE TABLE AS (CTAS) can use any data source provided by the storage plugin. To
write Parquet data using the CTAS command, set the session store.format option as shown in
the next section. Alternatively, configure the storage plugin to point to the directory containing
the Parquet files.</p>

diff --git a/docs/querying-parquet-files/index.html b/docs/querying-parquet-files/index.html
index 0aa6e57..62e4e5d 100644
--- a/docs/querying-parquet-files/index.html
+++ b/docs/querying-parquet-files/index.html
@@ -989,26 +989,7 @@
     <div class="int_text" align="left">
-        <p>Drill 1.2 and later extends SQL for performant querying of a large number,
thousands or more, of Parquet files. By running the following command, you trigger the generation
of metadata files in the directory of Parquet files and its subdirectories:</p>
-<p><code>REFRESH TABLE METADATA &lt;path to table&gt;</code></p>
-<p>You need to run the command on a file or directory only once during the session.
Subsequent queries return results quickly because Drill refers to the metadata saved in the
cache, as described in <a href="/docs/parquet-format/#reading-parquet-files">Reading
Parquet Files</a>. </p>
-<p>You can query nested directories from any level. For example, you can query a sub-sub-directory
of Parquet files because Drill stores a metadata cache of information at each level that covers
that particular level and all lower levels. </p>
-<h2 id="example-of-generating-parquet-metadata">Example of Generating Parquet Metadata</h2>
-<div class="highlight"><pre><code class="language-text" data-lang="text">0:
jdbc:drill:schema=dfs&gt; REFRESH TABLE METADATA t1;
-|  ok   |                   summary                    |
-| true  | Successfully updated metadata for table t1.  |
-1 row selected (0.445 seconds)
-<h2 id="sample-parquet-files">Sample Parquet Files</h2>
-<p>The Drill installation includes a <code>sample-data</code> directory
with Parquet files
+        <p>The Drill installation includes a <code>sample-data</code> directory
with Parquet files
 that you can query. Use SQL to query the <code>region.parquet</code> and
 <code>nation.parquet</code> files in the <code>sample-data</code>

diff --git a/docs/sql-extensions/index.html b/docs/sql-extensions/index.html
index bd44ac3..dbf869e 100644
--- a/docs/sql-extensions/index.html
+++ b/docs/sql-extensions/index.html
@@ -987,20 +987,12 @@
     <div class="int_text" align="left">
-        <p>Drill extends SQL to generating Parquet metadata, to work with Hadoop-scale
data, and to explore smaller-scale data in ways not possible with SQL. Using intuitive SQL
extensions you work with self-describing data and complex data types. Extensions to SQL include
capabilities for exploring self-describing data, such as files and HBase, directly in the
native format.</p>
+        <p>Drill extends SQL to explore smaller-scale data in ways not possible with
SQL. Using intuitive SQL extensions you work with self-describing data and complex data types.
Extensions to SQL include capabilities for exploring self-describing data, such as files and
HBase, directly in the native format.</p>
 <p>Drill provides language support for pointing to <a href="/docs/connect-a-data-source-introduction">storage
plugin</a> interfaces that Drill uses to interact with data sources. Use the name of
a storage plugin to specify a file system <em>database</em> as a prefix in queries
when you refer to objects across databases. Query files, including compressed .gz files, and
<a href="/docs/querying-directories">directories</a>, as you would query an SQL
table. You can query multiple files in a directory.</p>
 <p>Drill extends the SELECT statement for reading complex, multi-structured data. The
extended CREATE TABLE AS provides the capability to write data of complex/multi-structured
data types. Drill extends the <a href="http://drill.apache.org/docs/lexical-structure">lexical
rules</a> for working with files and directories, such as using back ticks for including
file names, directory names, and reserved words in queries. Drill syntax supports using the
file system as a persistent store for query profiles and diagnostic information.</p>
-<h2 id="extension-for-generating-parquet-metadata">Extension for Generating Parquet
-<p>To speed querying of Parquet files, you can <a href="/docs/querying-parquet-files/">generate
metadata</a> in Drill 1.2 and later. Running the following command triggers the generation
of metadata files in a directory of Parquet files and its subdirectories:</p>
-<p><code>REFRESH TABLE METADATA &lt;path to table&gt;</code></p>
-<p>Drill takes advantage of metadata, such as the Hive metadata store, and generates
a <a href="/docs/parquet-format/#caching-metadata">metadata cache</a>. Using metadata
can improve performance of queries on a large number of files. </p>
 <h2 id="extensions-for-hive--and-hbase-related-data-sources">Extensions for Hive- and
HBase-related Data Sources</h2>
 <p>Drill supports Hive and HBase as a plug-and-play data source. Drill can read tables
created in Hive that use <a href="/docs/hive-to-drill-data-type-mapping">data types
compatible</a> with Drill.  You can query Hive tables without modifications. You can
query self-describing data without requiring metadata definitions in the Hive metastore. Primitives,
such as JOIN, support columnar operation. </p>

diff --git a/feed.xml b/feed.xml
index 8754162..45acab7 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Thu, 01 Oct 2015 15:48:20 -0700</pubDate>
-    <lastBuildDate>Thu, 01 Oct 2015 15:48:20 -0700</lastBuildDate>
+    <pubDate>Thu, 01 Oct 2015 17:04:08 -0700</pubDate>
+    <lastBuildDate>Thu, 01 Oct 2015 17:04:08 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>

View raw message