hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajat Venkatesh (JIRA)" <>
Subject [jira] [Created] (HIVE-8467) Table Copy - Background, incremental data load
Date Wed, 15 Oct 2014 16:00:38 GMT
Rajat Venkatesh created HIVE-8467:

             Summary: Table Copy - Background, incremental data load
                 Key: HIVE-8467
             Project: Hive
          Issue Type: New Feature
            Reporter: Rajat Venkatesh

Traditionally, Hive and other tools in the Hadoop eco-system havent required a load stage.
However, with recent developments, Hive is much more performant when data is stored in specific
formats like ORC, Parquet, Avro etc. Technologies like Presto, also work much better with
certain data formats. At the same time, data is generated or obtained from 3rd parties in
non-optimal formats such as CSV, tab-limited or JSON. Many a times, its not an option to change
the data format at the source. We've found that users either use sub-optimal formats or spend
a large amount of effort creating and maintaining copies. We want to propose a new construct
- Table Copy - to help “load” data into an optimal storage format.

I am going to attach a PDF document with a lot more details especially addressing how is this
different from bulk loads in relational DBs or materialized views.

Looking forward to hear if others see a similar need to formalize conversion of data to different
storage formats.  If yes, are the details in the PDF document a good start ?

This message was sent by Atlassian JIRA

View raw message