hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables
Date Thu, 01 Mar 2018 23:44:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-18814:
----------------------------------
    Description: 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a {{Partition}} metadata object and sets the location to the
directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time
the data is decorated with row__id but the original transaction is 0.  I suspect in earlier
Hive versions this will throw or return no data.

 

One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data
there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This could then
be used to decorate data with ROW__IDs.  This avoids move/copy but retains data "outside"
of the table tree which make it more likely that this data will be modified in some way which
can really break things if done after and SQL update/delete on this data have happened. 

 

It performs no validations on add (except for partition spec) so any file with any format
can be added.  It allows add to bucketed tables as well.

Seems like a very dangerous command.  Maybe a better option is to block it and advise using
Load Data.  Alternatively, make this do Add partition metadata op followed by Load Data. 

 

 

  was:
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a \{{Partition}} metadata object and set the location to the
directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time
the data is decorated with row__id but the original transaction is 0.  I suspect in earlier
Hive versions this will throw or return no data.

 

One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data
there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This could then
be used to decorate data with ROW__IDs.  This avoids move/copy but retains data "outside"
of the table tree which make it more likely that this data will be modified in some way which
can really break things if done after and SQL update/delete on this data have happened. 

 

 

 

 


> Support Add Partition For Acid tables
> -------------------------------------
>
>                 Key: HIVE-18814
>                 URL: https://issues.apache.org/jira/browse/HIVE-18814
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>            Priority: Major
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the location to
the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read
time the data is decorated with row__id but the original transaction is 0.  I suspect in
earlier Hive versions this will throw or return no data.
>  
> One option is follow Load Data approach and create a new delta_x_x/ and move/copy the
data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This could
then be used to decorate data with ROW__IDs.  This avoids move/copy but retains data "outside"
of the table tree which make it more likely that this data will be modified in some way which
can really break things if done after and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file with any format
can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it and advise
using Load Data.  Alternatively, make this do Add partition metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message