kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: swap data in Kudu table
Date Fri, 23 Feb 2018 22:13:50 GMT
you are guys are awesome, thanks!

Todd, I like ALTER TABLE TBLPROPERTIES idea - will test it next week. Views
might work as well but for a number of reasons want to keep it as my last
resort :)

On Fri, Feb 23, 2018 at 4:32 PM, Todd Lipcon <todd@cloudera.com> wrote:

> A couple other ideas from the Impala side:
> - could you use a view and alter the view to point to a different table?
> Then all readers would be pointed at the view, and security permissions
> could be on that view rather than the underlying tables?
> - I think if you use an external table in Impala you could use an ALTER
> TABLE TBLPROPERTIES ... statement to change kudu.table_name to point to a
> different table. Then issue a 'refresh' on the impalads so that they load
> the new metadata. Subsequent queries would hit the new underlying Kudu
> table, but permissions and stats would be unchanged.
> -Todd
> On Fri, Feb 23, 2018 at 1:16 PM, Mike Percy <mpercy@apache.org> wrote:
>> Hi Boris, those are good ideas. Currently Kudu does not have atomic bulk
>> load capabilities or staging abilities. Theoretically renaming a partition
>> atomically shouldn't be that hard to implement, since it's just a master
>> metadata operation which can be done atomically, but it's not yet
>> implemented.
>> There is a JIRA to track a generic bulk load API here:
>> https://issues.apache.org/jira/browse/KUDU-1370
>> Since I couldn't find anything to track the specific features you
>> mentioned, I just filed the following improvement JIRAs so we can track it:
>>    - KUDU-2326: Support atomic bulk load operation
>>    <https://issues.apache.org/jira/browse/KUDU-2326>
>>    - KUDU-2327: Support atomic swap of tables or partitions
>>    <https://issues.apache.org/jira/browse/KUDU-2327>
>> Mike
>> On Thu, Feb 22, 2018 at 6:39 AM, Boris Tyukin <boris@boristyukin.com>
>> wrote:
>>> Hello,
>>> I am trying to figure out the best and safest way to swap data in a
>>> production Kudu table with data from a staging table.
>>> Basically, once in a while we need to perform a full reload of some
>>> tables (once in a few months). These tables are pretty large with billions
>>> of rows and we want to minimize the risk and downtime for users if
>>> something bad happens in the middle of that process.
>>> With Hive and Impala on HDFS, we can use a very cool handy command LOAD
>>> DATA INPATH. We can prepare data for reload in a staging table upfront and
>>> this process might take many hours. Once staging table is ready, we can
>>> issue LOAD DATA INPATH command which will move underlying HDFS files to a
>>> production table - this operation is almost instant and the very last step
>>> in our pipeline.
>>> Alternatively, we can swap partitions using ALTER TABLE EXCHANGE
>>> PARTITION command.
>>> Now with Kudu, I cannot seem to find a good strategy. The only thing
>>> came to my mind is to drop the production table and rename a staging table
>>> to production table as the last step of the job, but in this case we are
>>> going to lose statistics and security permissions.
>>> Any other ideas?
>>> Thanks!
>>> Boris
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message