hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <>
Subject RE: Importing into a hive database with minimal unavailability or renaming a database
Date Fri, 18 Dec 2015 20:56:59 GMT
Hi Marcin,



If the DDL update to the main table involves a new column, then at the moment Hive does not
support adding column. Yes the schema in metastore can change but the file system will not
allow you to add values to the new column.


1.    Thus as discussed in “Adding a new column to a table and updating it” thread, The
easiest option is to create a new table with the new column and do insert/select from the
existing table with values set for the new column

2.    So the classic ETL issue. The way I approach would be to use sqoop to import the key
column + changed columns to a staging table. Create a new table with the correct DDL in Hive.
Insert into this new table using unchanged columns from the existing Hive table and changed
columns from the staging table where the join will have source_table.KEY = staging_table.KEY.
This would be faster than recreating and populating the original table. Once the job done
move/drop the original table and rename the new table to the original table.


Remember this follows the CAP theorem. Data will be eventually consistent. However, Availability
is more important than Consistency. I hope this is correct.






Mich Talebzadeh


Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly <> 


NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.


From: Marcin Tustin [] 
Sent: 18 December 2015 14:15
Subject: Importing into a hive database with minimal unavailability or renaming a database


Hi All,


We import our production database into hive on a schedule using sqoop. Unfortunately, sqoop
won't update the table schema in hive when the table schema has changed in the source database.


Accordingly, to get updates to the table schema we drop the hive table first.


Unfortunately, this causes the data to be unavailable in hive for a certain period of time.


Accordingly, I'd like to know how people on this list have tackled the issue. Is there a way
to get sqoop to update the table schema in hive, or can we import into a staging hive database
and rename it?





Want to work at Handy? Check out our culture deck and open roles <>

Latest news <>  at Handy

Handy just raised $50m <>
 led by Fidelity


View raw message