From commits-return-10276-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Tue Jan 21 02:29:36 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5EB18180686 for ; Tue, 21 Jan 2020 03:29:35 +0100 (CET) Received: (qmail 22630 invoked by uid 500); 21 Jan 2020 02:29:34 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 22621 invoked by uid 99); 21 Jan 2020 02:29:34 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jan 2020 02:29:34 +0000 From: GitBox To: commits@hudi.apache.org Subject: [GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1261: [HUDI-403] Adds guidelines on deployment/upgrading Message-ID: <157957377464.8284.10215147175952841848.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Tue, 21 Jan 2020 02:29:34 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit lamber-ken commented on a change in pull request #1261: [HUDI-403] Adds guidelines on deployment/upgrading URL: https://github.com/apache/incubator-hudi/pull/1261#discussion_r368786695 ########## File path: docs/_docs/2_6_deployment.md ########## @@ -1,51 +1,87 @@ --- -title: Administering Hudi Pipelines -keywords: hudi, administration, operation, devops -permalink: /docs/admin_guide.html -summary: This section offers an overview of tools available to operate an ecosystem of Hudi datasets +title: Deployment Guide +keywords: hudi, administration, operation, devops, deployment +permalink: /docs/deployment.html +summary: This section offers an overview of tools available to operate an ecosystem of Hudi toc: true last_modified_at: 2019-12-30T15:59:57-04:00 --- -Admins/ops can gain visibility into Hudi datasets/pipelines in the following ways +This section provides all the help you need to deploy and operate Hudi tables at scale. +Specifically, we will cover the following aspects. - - [Administering via the Admin CLI](#admin-cli) - - [Graphite metrics](#metrics) - - [Spark UI of the Hudi Application](#spark-ui) + - [Deployment Model](#deploying) : How various Hudi components are deployed and managed. + - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, guidelines and general best-practices + - [Migrating to Hudi](#migrating) : How to migrate your existing tables to Apache Hudi. + - [Interacting via CLI](#cli) : Using the CLI to perform maintenance or deeper introspection + - [Monitoring](#monitoring) : Tracking metrics from your hudi tables using popular tools. + - [Troubleshooting](#troubleshooting) : Uncovering, triaging and resolving issues in production. + +## Deploying -This section provides a glimpse into each of these, with some general guidance on [troubleshooting](#troubleshooting) +All in all, Hudi deploys with no long running servers or additional infrastructure cost to your data lake. In fact, Hudi pioneered this model of building a transactional distributed storage layer +using existing infrastructure and its heartening to see other systems adopting similar approaches as well. Hudi writing is done via Spark jobs (DeltaStreamer or custom Spark datasource jobs), deployed per standard Apache Spark [recommendations](https://spark.apache.org/docs/latest/cluster-overview.html). +Querying Hudi tables happens via libraries installed into Apache Hive, Apache Spark or Presto and hence no additional infrastructure is necessary. -## Admin CLI -Once hudi has been built, the shell can be fired by via `cd hudi-cli && ./hudi-cli.sh`. -A hudi dataset resides on DFS, in a location referred to as the **basePath** and we would need this location in order to connect to a Hudi dataset. -Hudi library effectively manages this dataset internally, using .hoodie subfolder to track all metadata +## Upgrading + +New Hudi releases are listed on the [releases page](/releases), with detailed notes which list all the changes, with highlights in each release. +At the end of the day, Hudi is a storage system and with that comes a lot of responsibilities, which we take seriously. + +As general guidelines, + + - We strive to keep all changes backwards compatible (i.e new code can read old data/timeline files) and we cannot we will provide upgrade/downgrade tools via the CLI + - We cannot always guarantee forward compatibility (i.e old code being able to read data/timeline files written by a greater version). This is generally the norm, since no new features can be built otherwise. + However any large such changes, will be turned off by default, for smooth transition to newer release. After a few releases and once enough users deem the feature stable in production, we will flip the defaults in a subsequent release. + - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then upgrade the writers (deltastreamer, spark jobs using datasource). This often provides the best experience and it's easy to fix + any issues by rolling forward/back the writer code (which typically you might have more control over) + - With large, feature rich releases we recommend migrating slowly, by first testing in staging environments and running your own tests. Upgrading Hudi is no different than upgrading any database system. + +Note that release notes can override this information with specific instructions, applicable on case-by-case basis. + +## Migrating + +Currently migrating to Hudi can be done using two approaches Review comment: Hi, miss `.` at the end of statement. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services