hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "leesf (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HUDI-308) Avoid Renames for tracking state transitions of all actions on dataset
Date Mon, 03 Feb 2020 01:52:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

leesf updated HUDI-308:
-----------------------
    Fix Version/s:     (was: 0.5.2)
                   0.5.1

> Avoid Renames for tracking state transitions of all actions on dataset
> ----------------------------------------------------------------------
>
>                 Key: HUDI-308
>                 URL: https://issues.apache.org/jira/browse/HUDI-308
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: Balaji Varadarajan
>            Assignee: Balaji Varadarajan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>         Attachments: IMG_0118.jpg
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, We employ renames when transitioning states (REQUESTED, INFLIGHT, COMPLETED)
of all actions in Hudi. 
> The idea is to always create new files pertaining to each state of an action (commit,
compaction, clean, ....) that is being performed and have the Timeline management resolve
conflicts when loading them from .hoodie to folder.  The Archiving logic will cleanup transient
state files and archive terminal state files. 
> THis handling will be done consistently for all kinds of actions on datasets. As part
of this project, we will cleanup un-necessary fields in metada, version them and standardize
on avro/json.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message