hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "satish (Jira)" <>
Subject [jira] [Created] (HUDI-689) Fix hudi cli commands with overlap
Date Tue, 10 Mar 2020 17:23:00 GMT
satish created HUDI-689:

             Summary: Fix hudi cli commands with overlap
                 Key: HUDI-689
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
            Reporter: satish
            Assignee: satish

example timeline:

t0 -> create bucket1.parquet
t1 -> create and append updates bucket1.log
t2 -> request compaction 
t3 -> create bucket2.parquet

if compaction at t2 takes a long time, incremental reads using HoodieParquetInputFormat can
skip data ingested at t1 leading to 'data loss' (Data will still be on disk, but incremental
readers wont see it because its in log file and readers move to t3)

To workaround this problem, we want to stop returning data belonging to commits > t1. After
compaction is complete, incremental reader would see updates in t2, t3, so on.

This message was sent by Atlassian Jira

View raw message