hudi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Li <yanjia.gary...@gmail.com>
Subject [ANNOUNCE] Apache Hudi 0.8.0 released
Date Sat, 10 Apr 2021 12:53:00 GMT
Hi All,

The Apache Hudi team is pleased to announce the release of Apache Hudi
0.8.0.

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
Incrementals. Apache Hudi manages storage of large analytical datasets on
DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and
provides the ability to query them.

Since the 0.7.0 release, we resolved 97 JIRA tickets and made 120 code
commits. We implemented many new features, bugfix, and performance
improvement. Thanks to all the contributors who had made this happened.

The release note could be found here https://hudi.apache.org/releases.html

*Release Highlights*

*Flink Integration*
Since the initial support for the Hudi Flink Writer in the 0.7.0 release,
the Hudi community made great progress on improving the Flink/Hudi
integration, including redesigning the Flink writer pipeline with better
performance and scalability, state-backed indexing with bootstrap support,
Flink writer for MOR table, batch reader for COW&MOR table, streaming
reader for MOR table, and Flink SQL connector for both source and sink. In
the 0.8.0 release, the user is able to use all those features with Flink
1.11+.

Please see [RFC-24](
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+24%3A+Hoodie+Flink+Writer+Proposal)
for more implementation details of the Flink writer and follow this [page](
https://hudi.apache.org/docs/flink-quick-start-guide.html) to get started
with Flink!

*Parallel Writers Support*
As many users requested, now Hudi supports multiple ingestion writers to
the same Hudi Table with optimistic concurrency control. Hudi supports file
level OCC, i.e., for any 2 commits (or writers) happening to the same
table, if they do not have writes to overlapping files being changed, both
writers are allowed to succeed. This feature is currently experimental and
requires either Zookeeper or HiveMetastore to acquire locks.

Please see [RFC-22](
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers)
for more implementation details and follow this [page](
https://hudi.apache.org/docs/concurrency_control.html) to get started with
concurrency control!

*Writer side improvements*
- InsertOverwrite Support for Flink writer client.
- Support CopyOnWriteTable in Java writer client.

*Query side improvements*
- Support Spark Structured Streaming read from Hudi table.
- Performance improvement of Metadata table.
- Performance improvement of Clustering.

*Raw Release Notes*
The raw release notes are available [here](
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12349423
)

Thanks,
Gary Li
(on behalf of the Hudi community)

Mime
View raw message