hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From le...@apache.org
Subject [incubator-hudi] branch asf-site updated: [HUDI-611] Add Impala Guide to Doc (#1349)
Date Sun, 23 Feb 2020 02:19:32 GMT
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 49047de  [HUDI-611] Add Impala Guide to Doc (#1349)
49047de is described below

commit 49047deb43bc400ccc0fa6b2d02f2fe379d1c8a4
Author: YanJia-Gary-Li <yanjia.gary.li@gmail.com>
AuthorDate: Sat Feb 22 18:19:25 2020 -0800

    [HUDI-611] Add Impala Guide to Doc (#1349)
---
 docs/_docs/2_3_querying_data.cn.md | 30 ++++++++++++++++++++++++++++++
 docs/_docs/2_3_querying_data.md    | 29 +++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/docs/_docs/2_3_querying_data.cn.md b/docs/_docs/2_3_querying_data.cn.md
index 81f2273..b2c4870 100644
--- a/docs/_docs/2_3_querying_data.cn.md
+++ b/docs/_docs/2_3_querying_data.cn.md
@@ -145,3 +145,33 @@ scala> sqlContext.sql("select count(*) from hudi_rt where datestr
= '2016-10-02'
 
 Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。
 这需要在整个安装过程中将`hudi-presto-bundle` jar放入`<presto_install>/plugin/hive-hadoop2/`中。
+
+## Impala(此功能还未正式发布)
+
+### 读优化表
+
+Impala可以在HDFS上查询Hudi读优化表,作为一种 [EXTERNAL TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables)
的形式。  
+可以通过以下方式在Impala上建立Hudi读优化表:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+```
+Impala可以利用合理的文件分区来提高查询的效率。
+如果想要建立分区的表,文件夹命名需要根据此种方式`year=2020/month=1`.
+Impala使用`=`来区分分区名和分区值.  
+可以通过以下方式在Impala上建立分区Hudi读优化表:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+PARTITION BY (year int, month int, day int)
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+ALTER TABLE database.table_name RECOVER PARTITIONS;
+```
+在Hudi成功写入一个新的提交后, 刷新Impala表来得到最新的结果.
+```
+REFRESH database.table_name
+```
+
diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md
index 2d97e2b..0ee5e17 100644
--- a/docs/_docs/2_3_querying_data.md
+++ b/docs/_docs/2_3_querying_data.md
@@ -150,3 +150,32 @@ Additionally, `HoodieReadClient` offers the following functionality using
Hudi's
 
 Presto is a popular query engine, providing interactive query performance. Presto currently
supports only read optimized queries on Hudi tables. 
 This requires the `hudi-presto-bundle` jar to be placed into `<presto_install>/plugin/hive-hadoop2/`,
across the installation.
+
+## Impala(Not Officially Released)
+
+### Read optimized table
+
+Impala is able to query Hudi read optimized table as an [EXTERNAL TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables)
on HDFS.  
+To create a Hudi read optimized table on Impala:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+```
+Impala is able to take advantage of the physical partition structure to improve the query
performance.
+To create a partitioned table, the folder should follow the naming convention like `year=2020/month=1`.
+Impala use `=` to separate partition name and partition value.  
+To create a partitioned Hudi read optimized table on Impala:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+PARTITION BY (year int, month int, day int)
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+ALTER TABLE database.table_name RECOVER PARTITIONS;
+```
+After Hudi made a new commit, refresh the Impala table to get the latest results.
+```
+REFRESH database.table_name
+```


Mime
View raw message