hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanjia Gary Li (Jira)" <j...@apache.org>
Subject [jira] [Created] (HUDI-597) Enable incremental pulling from defined partitions
Date Tue, 04 Feb 2020 00:22:00 GMT
Yanjia Gary Li created HUDI-597:
-----------------------------------

             Summary: Enable incremental pulling from defined partitions
                 Key: HUDI-597
                 URL: https://issues.apache.org/jira/browse/HUDI-597
             Project: Apache Hudi (incubating)
          Issue Type: New Feature
            Reporter: Yanjia Gary Li
            Assignee: Yanjia Gary Li


For the use case that I only need to pull the incremental part of certain partitions, I need
to do the incremental pulling from the entire dataset first then filtering in Spark.

If we can use the folder partitions directly as part of the input path, it could run faster
by only load relevant parquet files.

Example:

 
{code:java}
spark.read.format("org.apache.hudi")
.option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
.load(path, "year=2020/*/*/*")
 
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message