hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-21771) Support partition filter (where clause) in REPL dump command (Bootstrap Dump)
Date Wed, 17 Jul 2019 16:24:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ASF GitHub Bot updated HIVE-21771:
----------------------------------
    Labels: pull-request-available  (was: )

> Support partition filter (where clause) in REPL dump command (Bootstrap Dump)
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-21771
>                 URL: https://issues.apache.org/jira/browse/HIVE-21771
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21771.01.patch, HIVE-21771.02.patch
>
>
> *Bootstrap for managed table*
> User should be allowed to execute REPL DUMP with where clause. The where clause should
support filtering out partition from dump. Format of the where clause should be similar to
*"REPL DUMP dbname from 10 where 't0' where key < 10,'t1'* where key = 3, '(t2*)|'t3' where
key > 3".* For initial version, very basic filter condition will be supported and later
the complexity will be increased as and when required.
>  * From the AST generated for the where clause, extract the table information.
>  * Generate AST for each table.
>  * List the partition for each table using the AST generated for each table using the
  same metastore API used by select query.
>  * During bootstrap load use the partition list to dump the partitions.
>  * During incremental dump, use the list to filter out the event.
> In case of bootstrap load, all the tables of the database will be scanned and
>  * If table is not partitioned, then it will be dumped.
>  * If key provided in the filter condition for the table is not a partition column, then
dump will fail.
>  * If table is not mentioned in the where clause, then all partitions of the table will
be dumped.
>  * All the partitioned of the table satisfying the where clause will be dumped.
> *Incremental for managed table (Not part of this patch)*
> In case of Incremental Dump, the events from the notification log will be scanned and
once the partition spec is extracted from the event, the partition spec will be filtered
against the condition.
>  * If table is not partitioned then the event will be added to the dump.
>  * If key mentioned is not a partition column, then dump will fail.
>  * If the table is not mentioned in the filter then event will be added to the dump.
>  * If the event is multi partitioned, then the event will be added to the dump. (Filtering
out redundant partitions from message will be done as part of separate task).
>  * If the partition spec matches the filter, then the event will be added to the dump*.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message