hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prashant Wason (Jira)" <>
Subject [jira] [Created] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly
Date Fri, 06 Mar 2020 19:49:00 GMT
Prashant Wason created HUDI-667:

             Summary: HoodieTestDataGenerator does not delete keys correctly
                 Key: HUDI-667
             Project: Apache Hudi (incubating)
          Issue Type: Bug
            Reporter: Prashant Wason

HoodieTestDataGenerator is used to generate sample data for unit-tests. It allows generating
HoodieRecords for insert/update/delete. It maintains the record keys in a HashMap.

private final Map<Integer, KeyPartition> existingKeys;

There are two issues in the implementation:
 # Delete from existingKeys uses KeyPartition rather than Integer keys
 # Inserting records after deletes is not correctly handled

The implementation uses the Integer key so that values can be looked up randomly. Assume three
values were inserted, then the HashMap will hold:

0 -> KeyPartition1
1 -> KeyPartition2
2 -> KeyPartition3

Now if we delete KeyPartition2  (generate a random record for deletion), the HashMap will

0 -> KeyPartition1
2 -> KeyPartition3


Now if we issue a insertBatch() then the insert is existingKeys.put(existingKeys.size(),
KeyPartition3) which will overwrite the KeyPartition3 already in the map rather than actually
inserting a new entry in the map.

This message was sent by Atlassian Jira

View raw message