hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinoth Chandar (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HUDI-687) incremental reads on MOR tables using RO view can lead to missing updates
Date Wed, 18 Mar 2020 04:17:00 GMT

     [ https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinoth Chandar updated HUDI-687:
--------------------------------
    Fix Version/s: 0.6.0

> incremental reads on MOR tables using RO view can lead to missing updates
> -------------------------------------------------------------------------
>
>                 Key: HUDI-687
>                 URL: https://issues.apache.org/jira/browse/HUDI-687
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>            Reporter: satish
>            Assignee: satish
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using HoodieParquetInputFormat
can skip data ingested at t1 leading to 'data loss' (Data will still be on disk, but incremental
readers wont see it because its in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to commits >
t1. After compaction is complete, incremental reader would see updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message