hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashok Kumar <>
Subject Re: Immutable data in Hive
Date Wed, 30 Dec 2015 18:20:42 GMT
Tank you sir,  very helpful. Could you also briefly describe from your experience  the major
differences between traditional ETL in DW and ELT in Hive?  Why there is emphasis to take
data from traditional transactional databases into Hive table with the same format and do
the transform in Hive after. Is it because Hive is meant to be efficient in data transformation? Regards   

    On Wednesday, 30 December 2015, 18:00, Alan Gates <> wrote:

 Traditionally data in Hive was write once (insert) read many.  You could append to tables
and partitions, add new partitions, etc.  You could remove data by dropping tables or partitions. 
But there was no updates of data or deletes of particular rows.  This was what was meant
by immutable.  Hive was originally done this way because it was based on MapReduce and HDFS
and these were the natural semantics given those underlying systems.

For many use cases (e.g. ETL) this is sufficient, and the vast majority of people still run
Hive this way.

We added transactions and updates and deletes to Hive because some use cases require these
features.  Hive is being used more and more as a data warehouse, and while updates and deletes
are less common there they are still required (slow changing dimensions, fixing wrong data,
deleting records for compliance, etc.)  Also streaming data into warehouses from transactional
systems is a common use case.


    Ashok Kumar  December 29, 2015 at 14:59  Hi,
Can someone please clarify what  "immutable data" in Hive means?
I have been told that data in Hive is/should be immutable but in that case why we need transactional
tables in Hive that allow updates to data.
thanks and greetings 

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message