hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <>
Subject [jira] [Created] (HIVE-18098) Add support for Export/Import for Acid tables
Date Fri, 17 Nov 2017 22:50:01 GMT
Eugene Koifman created HIVE-18098:

             Summary: Add support for Export/Import for Acid tables
                 Key: HIVE-18098
             Project: Hive
          Issue Type: New Feature
          Components: Transactions
            Reporter: Eugene Koifman

How should this work?
For regular tables export just copies the files under table root to a specified directory.
This doesn't make sense for Acid tables:
* Some data may belong to aborted transactons
* Transaction IDs are imbedded into data/files names.  You'd have export delta/ and base/
each of which may have files with the same names, e.g. bucket_00000.   
* On import these IDs won't make sense in a different cluster or even a different table (which
may have delta_x_x for example for the same x (but different data of course).
* Export creates a _metadata column types, storage format, etc.  Perhaps it can include info
about aborted IDs (if the whole file can't be skipped).
* Even importing into the same table on the same cluster may be a problem.  For example delta_5_5/
existed at the time of export and was included in the export.  But 2 days later it may not
exist because it was compacted and cleaned.
* If importing back into the same table on the same cluster, the data could be imported into
a different transaction (assuming per table writeIDs) w/o having to remap the IDs in the rows
* support Import Overwrite?
* Support Import as a new txn with remapping of ROW_IDs?  The new writeID can be stored in
a delta_x_x/_meta_data and ROW__IDs can be remapped at read time (like isOriginal) and made
permanent by compaction.
* It doesn't seem reasonable to import acid data into non-acid table

This message was sent by Atlassian JIRA

View raw message