From commits-return-9778-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Wed Jan 15 04:39:37 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 92DE918061A for ; Wed, 15 Jan 2020 05:39:37 +0100 (CET) Received: (qmail 28910 invoked by uid 500); 15 Jan 2020 04:39:37 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 28900 invoked by uid 99); 15 Jan 2020 04:39:36 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2020 04:39:36 +0000 From: GitBox To: commits@hudi.apache.org Subject: [GitHub] [incubator-hudi] nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start Message-ID: <157906317683.7554.16033557365897354348.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Wed, 15 Jan 2020 04:39:36 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574492697 @bhasudha : I have changed the way we wanna generate deletes. Basically I pass in insert records for which delete records will be generated. If we go with previous approach of generating random deletes, I couldn't verify if deletes actually deleted some records. So, have taken this approach. Steps I plan to add to Quick start is as follows - Generate a new batch of inserts. - Fetch all records from this new batch (// fix the rider value below since each batch will have unique rider value) val ds = spark.sql("select uuid, partitionPath from hudi_ro_table where rider = 'rider-213'") - Generate delete records val deletes = dataGen.generateDeletes(ds.collectAsList()) - Issue deletes val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2)); df.write.format("org.apache.hudi"). options(getQuickstartWriteConfigs). option(OPERATION_OPT_KEY,"delete"). option(PRECOMBINE_FIELD_OPT_KEY, "ts"). option(RECORDKEY_FIELD_OPT_KEY, "uuid"). option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath"). option(TABLE_NAME, tableName). mode(Append). save(basePath); - Same select query above should fetch 0 records since all records have been deleted. spark.sql("select uuid, partitionPath from hudi_ro_table where rider = 'rider-213'").count() ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services