hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer
Date Wed, 04 Mar 2020 19:16:53 GMT
garyli1019 commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits
in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-594767090
 
 
   Yeah, I definitely agree that there are some work to do to improve the migration process
to the delta streamer. In order to use `deltastreamer.checkpoint.reset_key` I will need something
like a `checkpointGenerator` mentioned above, otherwise it would be difficult to find the
correct checkpoint for each table. I have a few hundreds of tables to manage so I do need
a robust and trustworthy solution for the migration.
   Also, I think it makes sense to give more options to the users to play around with the
delta streamer for their own use cases.  
   e.g. 
   - Allow the user to get checkpoint from commits older than the last commit(This PR)
   - Allow the user to get checkpoint from a specific commit
   - Allow the user to store checkpoint info in the commit metadata even if they are not using
delta streamer. For example, when they are using HDFS importer or Spark Datasource writer
to do the initial bulk_insert.
   - Maybe more ...
   
   With though flexibility, I believe the user will be able to use the delta streamer in a
more programmatically way. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message