drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill-site git commit: Doc updates for Drill 1.11
Date Tue, 08 Aug 2017 02:27:47 GMT
Repository: drill-site
Updated Branches:
  refs/heads/asf-site e1cf1a3e6 -> bf475de9d


Doc updates for Drill 1.11


Project: http://git-wip-us.apache.org/repos/asf/drill-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill-site/commit/bf475de9
Tree: http://git-wip-us.apache.org/repos/asf/drill-site/tree/bf475de9
Diff: http://git-wip-us.apache.org/repos/asf/drill-site/diff/bf475de9

Branch: refs/heads/asf-site
Commit: bf475de9de8e2ba2794ae13aac33c9ff1104e195
Parents: e1cf1a3
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Mon Aug 7 19:27:32 2017 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Mon Aug 7 19:27:32 2017 -0700

----------------------------------------------------------------------
 docs/plugin-configuration-basics/index.html |  6 +--
 docs/query-profiles/index.html              | 69 ++++++++++--------------
 docs/start-up-options/index.html            | 18 ++++---
 feed.xml                                    |  4 +-
 4 files changed, 45 insertions(+), 52 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill-site/blob/bf475de9/docs/plugin-configuration-basics/index.html
----------------------------------------------------------------------
diff --git a/docs/plugin-configuration-basics/index.html b/docs/plugin-configuration-basics/index.html
index 61df700..c04860c 100644
--- a/docs/plugin-configuration-basics/index.html
+++ b/docs/plugin-configuration-basics/index.html
@@ -1126,7 +1126,7 @@
 
     </div>
 
-     Aug 4, 2016
+     Aug 8, 2017
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1222,13 +1222,13 @@ Using a copy of an existing configuration reduces the risk of JSON
coding errors
   </tr>
   <tr>
     <td>&quot;formats&quot;</td>
-    <td>&quot;psv&quot;<br>&quot;csv&quot;<br>&quot;tsv&quot;<br>&quot;parquet&quot;<br>&quot;json&quot;<br>&quot;avro&quot;<br>&quot;maprdb&quot;<br>&quot;sequencefile&quot;</td>
+    <td>&quot;pcap&quot;<br>&quot;psv&quot;<br>&quot;csv&quot;<br>&quot;tsv&quot;<br>&quot;parquet&quot;<br>&quot;json&quot;<br>&quot;avro&quot;<br>&quot;maprdb&quot;<br>&quot;sequencefile&quot;</td>
     <td>yes</td>
     <td>One or more valid file formats for reading. Drill detects formats of some files;
others require configuration. The maprdb format is in installations of the mapr-drill package.
 </td>
   </tr>
   <tr>
     <td>&quot;formats&quot; . . . &quot;type&quot;</td>
-    <td>&quot;text&quot;<br>&quot;parquet&quot;<br>&quot;json&quot;<br>&quot;maprdb&quot;<br>&quot;avro&quot;<br>&quot;sequencefile&quot;</td>
+    <td>&quot;pcap&quot;<br>&quot;text&quot;<br>&quot;parquet&quot;<br>&quot;json&quot;<br>&quot;maprdb&quot;<br>&quot;avro&quot;<br>&quot;sequencefile&quot;</td>
     <td>yes</td>
     <td>Format type. You can define two formats, csv and psv, as type &quot;Text&quot;,
but having different delimiters. </td>
   </tr>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/bf475de9/docs/query-profiles/index.html
----------------------------------------------------------------------
diff --git a/docs/query-profiles/index.html b/docs/query-profiles/index.html
index 82fc80f..2ffa68a 100644
--- a/docs/query-profiles/index.html
+++ b/docs/query-profiles/index.html
@@ -1126,100 +1126,89 @@
 
     </div>
 
-     Nov 21, 2016
+     Aug 8, 2017
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>A profile is a summary of metrics collected for each query that Drill executes.
Query profiles provide information that you can use to monitor and analyze query performance.
Drill creates a query profile from major, minor, operator, and input stream profiles. Each
major fragment profile consists of a list of minor fragment profiles. Each minor fragment
profile consists of a list of operator profiles. An operator profile consists of a list of
input stream profiles. </p>
+        <p>A profile is a summary of metrics collected for each query that Drill executes.
Query profiles provide information that you can use to monitor and analyze query performance.
When Drill executes a query, Drill writes the profile of each query to disk, which is either
the local filesystem or a distributed file system, such as HDFS. As of Drill 1.11, Drill can
<a href="/docs/start-up-options/#configuring-start-up-options">store profiles in memory</a>
instead of storing them to disk. You can view query profiles in the Drill Web Console at <code>http://&lt;IP
address or host name&gt;:8047</code>.  </p>
 
-<p>You can view aggregate statistics across profile lists in the Profile tab of the
Drill Web Console at <code>&lt;drill_node_ip_address&gt;:8047</code>.
You can modify and resubmit queries, or cancel queries. For debugging purposes, you can use
profiles in conjunction with Drill logs. See Log and Debug.</p>
+<h2 id="query-profiles-in-the-drill-web-console">Query Profiles in the Drill Web Console</h2>
 
-<p>Metrics in a query profile are associated with a coordinate system of IDs. Drill
uses a coordinate system comprised of query, fragment, and operator identifiers to track query
execution activities and resources. Drill assigns a unique QueryID to each query received
and then assigns IDs to each fragment and operator that executes the query.</p>
+<p>You can access query profiles in the Drill Web Console. The Drill Web Console provides
aggregate statistics across profile lists. Profile lists consist of data from major and minor
fragments, operators, and input streams. You can use profiles in conjunction with Drill logs
for debugging purposes. In addition to viewing query profiles, you can modify, resubmit, or
cancel queries from the Drill Web Console.  </p>
 
-<p><strong>Example IDs</strong></p>
+<h3 id="query,-fragment,-and-operator-identifiers">Query, Fragment, and Operator Identifiers</h3>
 
-<p>QueryID: 2aa98add-15b3-e155-5669-603c03bfde86</p>
-
-<p>Fragment and operator IDs:  </p>
+<p>Metrics in a query profile are associated with a coordinate system of identifiers.
Drill uses a coordinate system comprised of query, fragment, and operator identifiers to track
query execution activities and resources. Drill assigns a unique identifier, the QueryID,
to each query received and then assigns an identifier to each fragment and operator that executes
the query. An example of a QueryID is 2aa98add-15b3-e155-5669-603c03bfde86. The following
images shows an example of fragment and operator identifiers:</p>
 
 <p><img src="/docs/img/xx-xx-xx.png" alt="">  </p>
 
-<h2 id="viewing-a-query-profile">Viewing a Query Profile</h2>
+<h3 id="viewing-a-query-profile">Viewing a Query Profile</h3>
 
-<p>When you select the Profiles tab in the Drill Web Console at <code>&lt;drill_node_ip_address&gt;:8047</code>,
you see a list of the last 100 queries than have run or that are currently running in the
cluster.  </p>
+<p>You can view query profiles in the Profiles tab of the Drill Web Console. When you
select the Profiles tab, you see a list of the last 100 queries than ran or are currently
running in the cluster.  </p>
 
 <p><img src="/docs/img/list_queries.png" alt=""></p>
 
-<p>You can click on any query to see its profile.  </p>
+<p>You must click on a query to see its profile.  </p>
 
 <p><img src="/docs/img/query_profile.png" alt="">  </p>
 
-<p>When you select a profile, notice that the URL in the address bar contains the QueryID.
For example, 2aa98add-15b3-e155-5669-603c03bfde86 in the following URL:</p>
+<p>When you select a profile, notice that the URL in the address bar contains the QueryID,
as shown in the following URL:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">
  http://&lt;drill_node&gt;:8047/profiles/2aa98add-15b3-e155-5669-603c03bfde86
 </code></pre></div>
-<p>The Query Profile section in the Query profile summarizes a few key details about
the query, including: </p>
+<p>The Query Profile section summarizes a few key details about the query, including:
</p>
 
 <ul>
 <li>The state of the query, either running, completed, or failed.<br></li>
 <li>The node operating as the Foreman; the Drillbit that receives a query from the
client or application becomes the Foreman and drives the entire query. </li>
-<li>The total number of minor fragments required to execute the query</li>
+<li>The total number of minor fragments required to execute the query.</li>
 </ul>
 
-<p>If you scroll down, you can see the Fragment Profiles and Operator Profiles sections.
</p>
+<p>Further down you can see the Fragment Profiles and Operator Profiles sections. </p>
 
 <h2 id="fragment-profiles">Fragment Profiles</h2>
 
-<p>Fragment profiles section provides an overview table, and a major fragment block
for each major fragment that executed the query. Each row in the Overview table provides the
number of minor fragments that Drill parallelized from each major fragment, as well as aggregate
time and memory metrics for the minor fragments.  </p>
+<p>Fragment profiles provides an overview table and a major fragment block for each
major fragment. Each row in the Overview table provides the number of minor fragments that
Drill parallelized from each major fragment, as well as aggregate time and memory metrics
for the minor fragments.  </p>
 
 <p><img src="/docs/img/frag_profile.png" alt="">  </p>
 
-<p>See Major Fragment Profiles Table for column descriptions.</p>
-
 <p>When you look at the fragment profiles, you may notice that some major fragments
were parallelized into substantially fewer minor fragments, but happen to have the highest
runtime.  Or, you may notice certain minor fragments have a higher peak memory than others.
When you notice these variations in execution, you can delve deeper into the profile by looking
at the major fragment blocks.</p>
 
-<p>Below the Overview table are major fragment blocks. Each of these blocks corresponds
to a row in the Overview table. You can expand the blocks to see metrics for all of the minor
fragments that were parallelized from each major fragment, including the host on which each
minor fragment ran. Each row in the major fragment table presents the fragment state, time
metrics, memory metrics, and aggregate input metrics of each minor fragment.  </p>
+<p>Major fragment blocks correspond to a row in the Overview table. You can expand
the blocks to see metrics for all of the minor fragments that were parallelized from each
major fragment, including the host on which each minor fragment ran. Each row in the major
fragment table presents the fragment state, time metrics, memory metrics, and aggregate input
metrics of each minor fragment.  </p>
 
 <p><img src="/docs/img/maj_frag_block.png" alt="">  </p>
 
-<p>When looking at the minor fragment metrics, verify the state of the fragment. A
fragment can have a “failed” state which could indicate an issue on the host. If the query
itself fails, an operator may have run out of memory. If fragments running on a particular
node are under performing, there may be multi-tenancy issues that you can address.</p>
+<p>When looking at the minor fragment metrics, verify the state of the fragment. A
fragment can have a “failed” state which may indicate an issue on the host. If the query
itself fails, an operator may have run out of memory. If fragments running on a particular
node are under performing, there may be multi-tenancy issues that you can address.</p>
 
-<p>You can also see a graph that illustrates the activity of major and minor fragments
for the duration of the query.  </p>
+<p>A graph illustrates the activity of major and minor fragments for the duration of
the query. The graph correlates with the visualized plan graph in the Visualized Plan tab.
Each color in the graph corresponds to the activity of one major fragment. </p>
 
 <p><img src="/docs/img/graph_1.png" alt="">  </p>
 
-<p>If you see “stair steps” in the graph, this indicates that the execution work
of the fragments is not distributed evenly. Stair steps in the graph typically occur for non-local
reads on data. To address this issue, you can increase data replication, rewrite the data,
or file a JIRA to get help with the issue.</p>
-
-<p>This graph correlates with the visualized plan graph in the Visualized Plan tab.
Each color in the graph corresponds to the activity of one major fragment.  </p>
+<p>Stair steps in the graph indicate that the execution work of the fragments is not
distributed evenly. This typically occurs for non-local reads on data. To address this issue,
you can increase data replication, rewrite the data, or file a JIRA to get help with the issue.</p>
 
 <p><img src="/docs/img/vis_graph.png" alt="">  </p>
 
 <p>The visualized plan illustrates color-coded major fragments divided and labeled
with the names of the operators used to complete each phase of the query. Exchange operators
separate each major fragment. These operators represent a point where Drill can execute operations
below them in parallel.  </p>
 
-<h2 id="operator-profiles">Operator Profiles</h2>
+<h3 id="operator-profiles">Operator Profiles</h3>
 
 <p>Operator profiles describe each operator that performed relational operations during
query execution. The Operator Profiles section provides an Overview table of the aggregate
time and memory metrics for each operator within a major fragment.  </p>
 
 <p><img src="/docs/img/operator_table.png" alt="">  </p>
 
-<p>See Operator Profiles Table for column descriptions.</p>
-
 <p>Identify the operations that consume a majority of time and memory. You can potentially
modify options related to the specific operators to improve performance.</p>
 
 <p>Below the Overview table are operator blocks, which you can expand to see metrics
for each operator. Each of these blocks corresponds to a row in the Overview table. Each row
in the Operator block presents time and memory metrics, as well as aggregate input metrics
for each minor fragment.  </p>
 
 <p><img src="/docs/img/operator_block.png" alt="">  </p>
 
-<p>See Operator Block for column descriptions.</p>
-
-<p>Drill uses batches of records as a basic unit of work. The batches are pipelined
between each operation.  Record batches are no larger than 64k records. While the target size
of one record batch is generally 256k, they can scale to many megabytes depending on the query
plan and the width of the records.</p>
+<p>Drill uses batches of records as a basic unit of work. The batches are pipe-lined
between each operation. Record batches are no larger than 64K records. While the target size
of one record batch is generally 256K, they can scale to many megabytes depending on the query
plan and the width of the records.</p>
 
 <p>The Max Records number for each minor fragment should be almost equivalent. If one,
or a very small number of minor fragments, perform the majority of the work, there may be
data skew. To address data skew, you may need change settings related to table joins or partition
data to balance the work.  </p>
 
-<h3 id="data-skew-example">Data Skew Example</h3>
-
-<p>The following query was run against TPC-DS data:</p>
+<p><strong>Data Skew Example</strong><br>
+The following query was run against TPC-DS data:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">
  0: jdbc:drill:zk=local&gt; select ss_customer_sk, count(*) as cnt from store_sales where
ss_customer_sk is null or ss_customer_sk in (1, 2, 3, 4, 5) group by ss_customer_sk;
    +-----------------+---------+
    | ss_customer_sk  |   cnt   |
@@ -1241,22 +1230,22 @@
 
 <p>In this example, there is inherent skew present in the data. Other types of skew
may not strictly be data dependent, but can be introduced by a sub-optimal hash function or
other issues in the product. In either case, examining the query profile helps understand
why a query is slow. In the first scenario, it may be possible to run separate queries for
the skewed and non-skewed values. In the second scenario, it is better to seek technical support.
 </p>
 
-<h2 id="physical-plan-view">Physical Plan View</h2>
+<h3 id="physical-plan-view">Physical Plan View</h3>
 
-<p>The physical plan view provides statistics about the actual cost of the query operations
in terms of memory, I/O, and CPU processing. You can use this profile to identify which operations
consumed the majority of the resources during a query, modify the physical plan to address
the cost-intensive operations, and submit the updated plan back to Drill. See <a href="/docs/explain/#costing-information">Costing
Information</a>.  </p>
+<p>The Physical Plan view provides statistics about the actual cost of the query operations
in terms of memory, I/O, and CPU processing. You can use this profile to identify which operations
consumed the majority of the resources during a query, modify the physical plan to address
the cost-intensive operations, and submit the updated plan back to Drill. See <a href="/docs/explain/#costing-information">Costing
Information</a>.  </p>
 
 <p><img src="/docs/img/phys_plan_profile.png" alt="">  </p>
 
-<h2 id="canceling-a-query">Canceling a Query</h2>
+<h3 id="canceling-a-query">Canceling a Query</h3>
 
-<p>You may want to cancel a query if it hangs or causes performance bottlenecks. You
can cancel a query in the Profile tab of the Drill Web Console.</p>
+<p>You may want to cancel a query if it hangs or causes performance bottlenecks. You
can cancel a query from the Profile tab of the Drill Web Console.</p>
 
-<p>To cancel a query from the Drill Web Console, complete the following steps:  </p>
+<p>To cancel a query, complete the following steps:  </p>
 
 <ol>
 <li>Navigate to the Drill Web Console at <code>&lt;drill_node_ip_address&gt;:8047</code>.
 The Drill node from which you access the Drill Web Console must have an active Drillbit running.</li>
-<li>Select Profiles in the toolbar.
+<li>Select <strong>Profiles</strong> in the toolbar.
 A list of running and completed queries appears.</li>
 <li>Click the query for which you want to see the profile.</li>
 <li>Select <strong>Edit Query</strong>.</li>
@@ -1264,7 +1253,7 @@ A list of running and completed queries appears.</li>
 </ol>
 
 <p>The following message appears:  </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">
  Cancelled query &lt;QueryID\&gt;
+<div class="highlight"><pre><code class="language-text" data-lang="text">Cancelled
query &lt;QueryID\&gt;
 </code></pre></div>
     
       

http://git-wip-us.apache.org/repos/asf/drill-site/blob/bf475de9/docs/start-up-options/index.html
----------------------------------------------------------------------
diff --git a/docs/start-up-options/index.html b/docs/start-up-options/index.html
index 66f9d13..7e54b86 100644
--- a/docs/start-up-options/index.html
+++ b/docs/start-up-options/index.html
@@ -1126,7 +1126,7 @@
 
     </div>
 
-     Apr 14, 2016
+     Aug 8, 2017
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1173,16 +1173,20 @@ file tells Drill to scan that JAR file or associated object and include
it.</p>
 <p>The summary of start-up options, also known as boot options, lists default values.
The following descriptions provide more detail on key options that are frequently reconfigured:</p>
 
 <ul>
-<li>drill.exec.http.ssl_enabled<br>
+<li><strong>drill.exec.http.ssl_enabled</strong><br>
 Available in Drill 1.2. Enables or disables <a href="/docs/configuring-web-console-and-rest-api-security/#https-support">HTTPS
support</a>. Settings are TRUE and FALSE, respectively. The default is FALSE.<br></li>
-<li>drill.exec.sys.store.provider.class<br>
+<li><strong>drill.exec.sys.store.provider.class</strong><br>
 Defines the persistent storage (PStore) provider. The <a href="/docs/persistent-configuration-storage">PStore</a>
holds configuration and profile data.<br></li>
-<li>drill.exec.buffer.size<br>
+<li><strong>drill.exec.buffer.size</strong><br>
 Defines the amount of memory available, in terms of record batches, to hold data on the downstream
side of an operation. Drill pushes data downstream as quickly as possible to make data immediately
available. This requires Drill to use memory to hold the data pending operations. When data
on a downstream operation is required, that data is immediately available so Drill does not
have to go over the network to process it. Providing more memory to this option increases
the speed at which Drill completes a query.<br></li>
-<li>drill.exec.sort.external.spill.directories<br>
+<li><strong>drill.exec.sort.external.spill.directories</strong><br>
 Tells Drill which directory to use when spooling. Drill uses a spool and sort operation for
beyond memory operations. The sorting operation is designed to spool to a Hadoop file system.
The default Hadoop file system is a local file system in the <code>/tmp</code>
directory. Spooling performance (both writing and reading back from it) is constrained by
the file system.<br></li>
-<li>drill.exec.zk.connect<br>
-Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting
to point to the ZooKeeper quorum that you want Drill to use. You must configure this option
on each Drillbit node.</li>
+<li><strong>drill.exec.zk.connect</strong><br>
+Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting
to point to the ZooKeeper quorum that you want Drill to use. You must configure this option
on each Drillbit node.<br></li>
+<li><strong>drill.exec.profiles.store.inmemory</strong><br>
+Available as of Drill 1.11. When set to TRUE, enables Drill to store query profiles in memory
instead of writing the query profiles to disk. When set to FALSE, Drill writes the profile
for each query to disk, which is either the local file system or a distributed file system,
such as HDFS. For sub-second queries, writing the query profile to disk is expensive due to
the interactions with the file system. Enable this option if you want Drill to store the profiles
of sub-second queries in memory instead of writing them to disk. When you enable this option,
Drill stores the profiles in memory for as long as the drillbit runs. When the drillbit restarts,
the profiles no longer exist. You can set the maximum number of most recent profiles to retain
in memory through the drill.exec.profiles.store.capacity option. Settings are TRUE and FALSE.
Default is FALSE.<br></li>
+<li><strong>drill.exec.profiles.store.capacity</strong><br>
+Available as of Drill 1.11. Sets the maximum number of most recent profiles to retain in
memory when the drill.exec.profiles.store.inmemory option is enabled. Default is 1000.<br></li>
 </ul>
 
     

http://git-wip-us.apache.org/repos/asf/drill-site/blob/bf475de9/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index aab8d74..16438e3 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Mon, 07 Aug 2017 12:23:58 -0700</pubDate>
-    <lastBuildDate>Mon, 07 Aug 2017 12:23:58 -0700</lastBuildDate>
+    <pubDate>Mon, 07 Aug 2017 19:25:51 -0700</pubDate>
+    <lastBuildDate>Mon, 07 Aug 2017 19:25:51 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>


Mime
View raw message