drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] paul-rogers commented on a change in pull request #1953: Add docs for Drill Metastore
Date Wed, 15 Jan 2020 05:05:52 GMT
paul-rogers commented on a change in pull request #1953: Add docs for Drill Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r366687593
 
 

 ##########
 File path: _docs/sql-reference/sql-commands/007-analyze-table-refresh-metadata.md
 ##########
 @@ -0,0 +1,158 @@
+---
+title: "ANALYZE TABLE REFRESH METADATA"
+parent: "SQL Commands"
+date: 2020-01-13
+---
+
+Starting from Drill 1.17, you can store table metadata (including schema and computed statistics)
into Drill Metastore.
+This metadata will be used when querying a table for more optimal plan creation.
+
+{% include startnote.html %}In Drill 1.17, this feature is supported for Parquet tables only
and is disabled by default.{% include endnote.html %}
+
+To enable Drill Metastore usage, the following option `metastore.enabled` should be set to
`true`, as shown:
+
+	SET `metastore.enabled` = true;
+
+Alternatively, you can enable the option in the Drill Web UI at `http://<drill-hostname-or-ip-address>:8047/options`.
+
+## Syntax
+
+The ANALYZE TABLE REFRESH METADATA statement supports the following syntax:
+
+	ANALYZE TABLE [table_name] [COLUMNS {(col1, col2, ...) | NONE}]
+	REFRESH METADATA ['level' LEVEL]
+	[{COMPUTE | ESTIMATE} | STATISTICS [(column1, column2, ...)]
+	[ SAMPLE number PERCENT ]]
+
+## Parameters
+
+*table_name*
+The name of the table or directory for which Drill will collect table metadata. If the table
does not exist, or the table
+ is temporary, the command fails and metadata is not collected and stored.
+
+*COLUMNS (col1, col2, ...)*
+Optional names of the column(s) for which Drill will generate and store metadata. the Stored
schema will include all table columns.
+
+*COLUMNS NONE*
+Specifies to ignore collecting and storing metadata for all table columns.
+
+*level*
+Optional varchar literal which specifies maximum level depth for collecting metadata.
+Possible values: `TABLE`, `SEGMENT`, `PARTITION`, `FILE`, `ROW_GROUP`, `ALL`. Default is
`ALL`.
+
+*COMPUTE*
+Generates statistics for the table to be stored into the Metastore.
+If statistics usage is disabled (`planner.enable_statistics` is set to `false`), an error
will be thrown when this clause is specified.
+
+*ESTIMATE*
+Generates estimated statistics for the table to be stored into the Metastore. Currently is
not supported.
+
+*(column1, column2, ...)*
+The name of the column(s) for which Drill will generate statistics.
+
+*SAMPLE*
+Optional. Indicates that compute statistics should run on a subset of the data.
+
+*number PERCENT*  
+An integer that specifies the percentage of data on which to compute statistics. For example,
if a table has 100 rows, `SAMPLE 50 PERCENT` indicates that statistics should be computed
on 50 rows. The optimizer selects the rows at random. 
+
+## Related Options
+
+- **metastore.enabled**
+Enables Drill Metastore usage to be able to store table metadata during ANALYZE TABLE commands
execution and to be able
+ to read table metadata during regular queries execution or when querying some INFORMATION_SCHEMA
tables. Default is `false`.
+- **metastore.metadata.store.depth_level**
+Specifies maximum level depth for collecting metadata. Default is `'ALL'`.
+- **metastore.retrieval.retry_attempts**
+Specifies the number of attempts for retrying query planning after detecting that query metadata
is changed.
+If the number of retries was exceeded, query will be planned without metadata information
from the Metastore. Default is 5.
+- **metastore.metadata.fallback_to_file_metadata**
+Allows using file metadata cache for the case when required metadata is absent in the Metastore.
Default is true.
+- **metastore.metadata.use_schema**
+Enables schema usage, stored to the Metastore. Default is `true`.
+- **metastore.metadata.use_statistics**
+Enables statistics usage, stored in the Metastore, at the planning stage. Default is `true`.
+- **metastore.metadata.ctas.auto-collect**
+Specifies whether schema and / or column statistics will be automatically collected for every
table after CTAS and CTTAS.
+This option is not active for now. Default is `'NONE'`.
+- **drill.exec.storage.implicit.last_modified_time.column.label**
+Sets the implicit column name for the last modified time (`lmt`) column. For internal usage
when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_index.column.label**
+Sets the implicit column name for the row group index (`rgi`) column. For internal usage
when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_length.column.label**
+Sets the implicit column name for the row group length (`rgl`) column. For internal usage
when producing Metastore analyze.
+- **drill.exec.storage.implicit.row_group_start.column.label**
+Sets the implicit column name for the row group start (`rgs`) column. For internal usage
when producing Metastore analyze.
+
+## Related Commands
+
+To drop table metadata from the Metastore, the following command may be used:
+
+	ANALYZE TABLE [table_name] DROP [METADATA|STATISTICS] [IF EXISTS]
+
+It will not throw an exception for absent table metadata if `IF EXISTS` clause was specified.
+
+	ANALYZE TABLE [workspace.]table_name COMPUTE STATISTICS [(column1, column2,...)] [SAMPLE
number PERCENT]
 
 Review comment:
   (Is `workspace.` correct? A workspace is something unique to a session. Do we mean `plugin.schema`
instead?
   
   Also, what does happen if the user works via a workspace? Does Drill work out the underlying
plugin and schema name?
   
   What happens if the user analyzes a table, then removes or changes the plugin or schema?
Does the data remain in the Metastore? Can the user remove it, given that the plugin and schema
names are now invalid?
   
   Sadly, altering a plugin or schema is not done with a SQL command, so there is no way to
prevent changes while metadata is present, or trigger metadata removal.
   
   How does the user clean up in this case?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message