hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anishek (JIRA)" <>
Subject [jira] [Commented] (HIVE-17367) IMPORT table doesn't load from data dump if a metadata-only dump was already imported.
Date Thu, 31 Aug 2017 16:43:00 GMT


anishek commented on HIVE-17367:

[~sankarh]can you rebase it, does not apply cleanly currently. Also can you provide a pull
request for the same please. 

one initial comment, 
The replicationSpec should not be updating the currentEventId in TableExport, all of that
should be done before we call table export, for replication that is done already in RepldumpTask
for export this should be done in ExportSemanticAnalyzer and not in TableExport.

> IMPORT table doesn't load from data dump if a metadata-only dump was already imported.
> --------------------------------------------------------------------------------------
>                 Key: HIVE-17367
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, Import/Export, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, replication
>             Fix For: 3.0.0
>         Attachments: HIVE-17367.01.patch, HIVE-17367.02.patch
> Repl v1 creates a set of EXPORT/IMPORT commands to replicate modified data (as per events)
across clusters.
> For instance, let's say, insert generates 2 events such as
> INSERT (ID: 11)
> Each event generates a set of EXPORT and IMPORT commands.
> ALTER_TABLE event generates metadata only export/import
> INSERT generates metadata+data export/import.
> As Hive always dump the latest copy of table during export, it sets the latest notification
event ID as current state of it. So, in this example, import of metadata by ALTER_TABLE event
sets the current state of the table as 11.
> Now, when we try to import the data dumped by INSERT event, it is noop as the table's
current state(11) is equal to the dump state (11) which in-turn leads to the data never gets
replicated to target cluster.
> So, it is necessary to allow overwrite of table/partition if their current state equals
the dump state.

This message was sent by Atlassian JIRA

View raw message