hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanjia Gary Li (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HUDI-644) checkpoint generator tool for delta streamer
Date Thu, 12 Mar 2020 00:43:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yanjia Gary Li updated HUDI-644:
--------------------------------
    Summary: checkpoint generator tool for delta streamer  (was: Enable to retrieve checkpoint
from previous commits in Delta Streamer)

> checkpoint generator tool for delta streamer
> --------------------------------------------
>
>                 Key: HUDI-644
>                 URL: https://issues.apache.org/jira/browse/HUDI-644
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Yanjia Gary Li
>            Assignee: Yanjia Gary Li
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket is to resolve the following problem:
> The user is using a homebrew Spark data source to read new data and write to Hudi table
> The user would like to migrate to Delta Streamer
> But the Delta Streamer only checks the last commit metadata, if there is no checkpoint
info, then the Delta Streamer will use the default. For Kafka source, it is LATEST. 
> The user would like to run the homebrew Spark data source reader and Delta Streamer in
parallel to prevent data loss, but the Spark data source writer will make commit without checkpoint
info, which will reset the delta streamer. 
> So if we have an option to allow the user to retrieve the checkpoint from previous commits
instead of the latest commit would be helpful for the migration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message