hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] vinothchandar commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer
Date Mon, 09 Mar 2020 07:05:31 GMT
vinothchandar commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous
commits in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-596362404
 
 
   Okay. caught up now.. 
   
   Firstly, writing in parallel using two jobs is a dangerous thing as Hudi does not support
such multi writer access. I would advise against it (although you could hack it to work per
se if you tried enough)..  
   
   @garyli1019 we can definitely add tooling to generate checkpoints in the format that DeltaStreamer
expects..  But, I would like to decouple that from the delta streamer itself.. I favor, keeping
it simple and just a single knob for the user wanting to override the checkpoint.. There is
already an option to override the checkpoint I believe.. 
   ```
      /**
        * Resume Delta Streamer from this checkpoint.
        */
       @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this
checkpoint.")
       public String checkpoint = null;
   ```
   
   >> I need a robust way to generate the checkpoint from kafka-connect-hdfs managed
files and kafka-connect itself sometimes having an issue to retrieve checkpoint when the Kafka
partition number was large
   
   Would like to understand this more in general .. For DFS sources, all you need is a timestamp
right? And for Kafka, you need to call `consumer.offsetForTimes()` and get a bunch of offsets
to override from
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message