hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prashant Wason (Jira)" <>
Subject [jira] [Commented] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly
Date Thu, 12 Mar 2020 18:17:00 GMT


Prashant Wason commented on HUDI-667:

Its not a corner case. It will happen every time an insert/update batch is generated after
a delete. I think there are no unit tests like that and hence we have not seen this issue. 


We will hit the issue anytime the y deletes are not the largest indexes in existingKeys (the
probability of which is close to 1.0 as we delete randomly). 

> HoodieTestDataGenerator does not delete keys correctly
> ------------------------------------------------------
>                 Key: HUDI-667
>                 URL:
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 1h
>          Time Spent: 20m
>  Remaining Estimate: 40m
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It allows generating
HoodieRecords for insert/update/delete. It maintains the record keys in a HashMap.
> private final Map<Integer, KeyPartition> existingKeys;
> There are two issues in the implementation:
>  # Delete from existingKeys uses KeyPartition rather than Integer keys
>  # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up randomly. Assume
three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2  (generate a random record for deletion), the HashMap
will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
> Now if we issue a insertBatch() then the insert is existingKeys.put(existingKeys.size(),
KeyPartition3) which will overwrite the KeyPartition3 already in the map rather than actually
inserting a new entry in the map.

This message was sent by Atlassian Jira

View raw message