From commits-return-12414-archive-asf-public=cust-asf.ponee.io@hudi.apache.org  Mon Mar  2 09:28:41 2020
Return-Path: <commits-return-12414-archive-asf-public=cust-asf.ponee.io@hudi.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 2882818062C
	for <archive-asf-public@cust-asf.ponee.io>; Mon,  2 Mar 2020 10:28:41 +0100 (CET)
Received: (qmail 46168 invoked by uid 500); 2 Mar 2020 09:28:40 -0000
Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:commits-help@hudi.apache.org>
List-Unsubscribe: <mailto:commits-unsubscribe@hudi.apache.org>
List-Post: <mailto:commits@hudi.apache.org>
List-Id: <commits.hudi.apache.org>
Reply-To: dev@hudi.apache.org
Delivered-To: mailing list commits@hudi.apache.org
Received: (qmail 46158 invoked by uid 99); 2 Mar 2020 09:28:40 -0000
Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2020 09:28:40 +0000
From: GitBox <git@apache.org>
To: commits@hudi.apache.org
Subject: [GitHub] [incubator-hudi] pratyakshsharma commented on issue #1362: HUDI-644
 Enable user to get checkpoint from previous commits in DeltaStreamer
Message-ID: <158314132050.15641.5774650941991921462.gitbox@gitbox.apache.org>
References: <hudi.1362.MDExOlB1bGxSZXF1ZXN0MzgxMTM0MTI1.gitbox@gitbox.apache.org>
In-Reply-To: <hudi.1362.MDExOlB1bGxSZXF1ZXN0MzgxMTM0MTI1.gitbox@gitbox.apache.org>
Date: Mon, 02 Mar 2020 09:28:40 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

pratyakshsharma commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-593306470
 
 
   @garyli1019 If I understand it correctly, you are talking of a use case where you are using HoodieDeltaStreamer along with using spark data source as a backup. Why do you want to have two different pipelines writing to the same destination path? 
   
   If you really want to have a backup to prevent any data loss, you can write to a separate path using spark data source and continue using DeltaStreamer to write to Hudi dataset. In case of any issues, you can always use CHECKPOINT_RESET_KEY to ingest the data from your back up path into your Hudi dataset path. We have support for kafka as well as DFS source for this purpose. 
   
   Also what is the source for your homebrew spark? If it is also consuming from kafka, then I do not see any case where using DeltaStreamer can result in data loss. Can you please explain why do you want to use two pipelines for writing to the same destination path? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services