hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1261: [HUDI-403] Adds guidelines on deployment/upgrading
Date Tue, 21 Jan 2020 02:29:34 GMT
lamber-ken commented on a change in pull request #1261: [HUDI-403] Adds guidelines on deployment/upgrading
URL: https://github.com/apache/incubator-hudi/pull/1261#discussion_r368786695
 
 

 ##########
 File path: docs/_docs/2_6_deployment.md
 ##########
 @@ -1,51 +1,87 @@
 ---
-title: Administering Hudi Pipelines
-keywords: hudi, administration, operation, devops
-permalink: /docs/admin_guide.html
-summary: This section offers an overview of tools available to operate an ecosystem of Hudi
datasets
+title: Deployment Guide
+keywords: hudi, administration, operation, devops, deployment
+permalink: /docs/deployment.html
+summary: This section offers an overview of tools available to operate an ecosystem of Hudi
 toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
-Admins/ops can gain visibility into Hudi datasets/pipelines in the following ways
+This section provides all the help you need to deploy and operate Hudi tables at scale. 
+Specifically, we will cover the following aspects.
 
- - [Administering via the Admin CLI](#admin-cli)
- - [Graphite metrics](#metrics)
- - [Spark UI of the Hudi Application](#spark-ui)
+ - [Deployment Model](#deploying) : How various Hudi components are deployed and managed.
+ - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, guidelines and general
best-practices
+ - [Migrating to Hudi](#migrating) : How to migrate your existing tables to Apache Hudi.
+ - [Interacting via CLI](#cli) : Using the CLI to perform maintenance or deeper introspection
+ - [Monitoring](#monitoring) : Tracking metrics from your hudi tables using popular tools.
+ - [Troubleshooting](#troubleshooting) : Uncovering, triaging and resolving issues in production.
+ 
+## Deploying
 
-This section provides a glimpse into each of these, with some general guidance on [troubleshooting](#troubleshooting)
+All in all, Hudi deploys with no long running servers or additional infrastructure cost to
your data lake. In fact, Hudi pioneered this model of building a transactional distributed
storage layer
+using existing infrastructure and its heartening to see other systems adopting similar approaches
as well. Hudi writing is done via Spark jobs (DeltaStreamer or custom Spark datasource jobs),
deployed per standard Apache Spark [recommendations](https://spark.apache.org/docs/latest/cluster-overview.html).
+Querying Hudi tables happens via libraries installed into Apache Hive, Apache Spark or Presto
and hence no additional infrastructure is necessary. 
 
-## Admin CLI
 
-Once hudi has been built, the shell can be fired by via  `cd hudi-cli && ./hudi-cli.sh`.
-A hudi dataset resides on DFS, in a location referred to as the **basePath** and we would
need this location in order to connect to a Hudi dataset.
-Hudi library effectively manages this dataset internally, using .hoodie subfolder to track
all metadata
+## Upgrading 
+
+New Hudi releases are listed on the [releases page](/releases), with detailed notes which
list all the changes, with highlights in each release. 
+At the end of the day, Hudi is a storage system and with that comes a lot of responsibilities,
which we take seriously. 
+
+As general guidelines, 
+
+ - We strive to keep all changes backwards compatible (i.e new code can read old data/timeline
files) and we cannot we will provide upgrade/downgrade tools via the CLI
+ - We cannot always guarantee forward compatibility (i.e old code being able to read data/timeline
files written by a greater version). This is generally the norm, since no new features can
be built otherwise.
+   However any large such changes, will be turned off by default, for smooth transition to
newer release. After a few releases and once enough users deem the feature stable in production,
we will flip the defaults in a subsequent release.
+ - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then
upgrade the writers (deltastreamer, spark jobs using datasource). This often provides the
best experience and it's easy to fix 
+   any issues by rolling forward/back the writer code (which typically you might have more
control over)
+ - With large, feature rich releases we recommend migrating slowly, by first testing in staging
environments and running your own tests. Upgrading Hudi is no different than upgrading any
database system.
+
+Note that release notes can override this information with specific instructions, applicable
on case-by-case basis.
+
+## Migrating
+
+Currently migrating to Hudi can be done using two approaches 
 
 Review comment:
   Hi, miss `.` at the end of statement.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message