hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
Date Thu, 16 Apr 2015 10:54:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497895#comment-14497895
] 

Alan Gates commented on HIVE-10228:
-----------------------------------

Wow, when I saw it was a 150K patch I was hoping it was mostly generated code.  No such luck.

Code level comments on review board, higher level below:

This stuff needs some major doc work as you're introducing a new concept of a table being
replicated or generated from replication.  Is there a doc JIRA for the replication work yet?
 If so we should link it to this JIRA.

Parser changes:
I don't understand why DROP TABLE needs the replication clause.  As far as I can tell from
the changes in DDLSemanticAnalyzer this is semantically equivalent to IF EXISTS.  Why not
use that?

Adding METADATA and REPLICATION as keywords is not backwards compatible.  We either need to
explicitly note that in this JIRA or add them to the list of reserved keywords allowed as
identifiers in IdentifiersParser.g.  I suspect the latter is a better choice.







> Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-10228
>                 URL: https://issues.apache.org/jira/browse/HIVE-10228
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Import/Export
>    Affects Versions: 1.2.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch
>
>
> We need to update a couple of hive commands to support replication semantics. To wit,
we need the following:
> EXPORT ... [FOR [METADATA] REPLICATION(“comment”)]
> Export will now support an extra optional clause to tell it that this export is being
prepared for the purpose of replication. There is also an additional optional clause here,
that allows for the export to be a metadata-only export, to handle cases of capturing the
diff for alter statements, for example.
> Also, if done for replication, the non-presence of a table, or a table being a view/offline
table/non-native table is not considered an error, and instead, will result in a successful
no-op.
> IMPORT ... (as normal) – but handles new semantics 
> No syntax changes for import, but import will have to change to be able to handle all
the permutations of export dumps possible. Also, import will have to ensure that it should
update the object only if the update being imported is not older than the state of the object.
Also, import currently does not work with dbname.tablename kind of specification, this should
be fixed to work.
> DROP TABLE ... FOR REPLICATION('eventid')
> Drop Table now has an additional clause, to specify that this drop table is being done
for replication purposes, and that the dop should not actually drop the table if the table
is newer than that event id specified.
> ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid')
> Similarly, Drop Partition also has an equivalent change to Drop Table.
> =
> In addition, we introduce a new property "repl.last.id", which when tagged on to table
properties or partition properties on a replication-destination, holds the effective "state
identifier" of the object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message