hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly
Date Mon, 09 Mar 2020 06:49:31 GMT
vinothchandar edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not
handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-596357138
 
 
   >>then start the delta streamer, hudi will store the empty checkpoint.
   Re-reading this again.. Is this the right behavior? I think there are a few cases now handled
in delta-streamer that has made life a bit complicated.. 
   
   Reason for writing such empty checkpoint could be that - we want to write checkpoints even
for empty commits, since it could have read data but the transformer could have filtered all
of that out.. 
   I think the right fix could be to checkpoint the actual fromOffsets instead of empty checkpoint..

   
   >>the second commit will use the last checkpoint {}, which means the fromoffset is
0.
   but the previous messages may be removed because of kafka retention mechanism.
   
   And this is because we enter `checkupValidOffsets` right? 
   
   I'd appreciate it if we took into consideration how checkpoint is handled in a general
source agnostic way and also fix this issue.. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message