hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
Date Wed, 15 Jan 2020 04:39:36 GMT
nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion
support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574492697
 
 
   @bhasudha : I have changed the way we wanna generate deletes. Basically I pass in insert
records for which delete records will be generated. If we go with previous approach of generating
random deletes, I couldn't verify if deletes actually deleted some records. So, have taken
this approach.
   
   Steps I plan to add to Quick start is as follows
   
   - Generate a new batch of inserts.
   - Fetch all records from this new batch (// fix the rider value below since each batch
will have unique rider value)
   val ds = spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 'rider-213'")
   - Generate delete records
   val deletes = dataGen.generateDeletes(ds.collectAsList())
   - Issue deletes
   val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
   df.write.format("org.apache.hudi").
       options(getQuickstartWriteConfigs).
       option(OPERATION_OPT_KEY,"delete").
       option(PRECOMBINE_FIELD_OPT_KEY, "ts").
       option(RECORDKEY_FIELD_OPT_KEY, "uuid").
       option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
       option(TABLE_NAME, tableName).
       mode(Append).
       save(basePath);
   
   - Same select query above should fetch 0 records since all records have been deleted. 
   spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 'rider-213'").count()
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message