hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer
Date Tue, 03 Mar 2020 05:09:44 GMT
garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits
in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-593770649
 
 
   I think running the parallel jobs once sounds a little bit hacky. The best way should be
to generate the checkpoint string and pass it to the delta streamer in the first run. In this
way, I will need to write a checkpoint generator to scan all the files generated by Kafka
connect. This is definitely doable but needs some effort. 
   So I think we can do this to help the users migrate to delta streamer:
   - checkPointGenerator helper functions help users generate the checkpoint from popular
sink connectors(Kafka connect, Spark streaming e.t.c)
   - Allow the user to commit without using delta streamer to fix the gap if the checkpoint
is difficult to generate.
   Any thoughts? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message