hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajat Venkatesh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8467) Table Copy - Background, incremental data load
Date Wed, 15 Oct 2014 16:01:33 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rajat Venkatesh updated HIVE-8467:
----------------------------------
    Attachment: Table Copies.pdf

> Table Copy - Background, incremental data load
> ----------------------------------------------
>
>                 Key: HIVE-8467
>                 URL: https://issues.apache.org/jira/browse/HIVE-8467
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rajat Venkatesh
>         Attachments: Table Copies.pdf
>
>
> Traditionally, Hive and other tools in the Hadoop eco-system havent required a load stage.
However, with recent developments, Hive is much more performant when data is stored in specific
formats like ORC, Parquet, Avro etc. Technologies like Presto, also work much better with
certain data formats. At the same time, data is generated or obtained from 3rd parties in
non-optimal formats such as CSV, tab-limited or JSON. Many a times, its not an option to change
the data format at the source. We've found that users either use sub-optimal formats or spend
a large amount of effort creating and maintaining copies. We want to propose a new construct
- Table Copy - to help “load” data into an optimal storage format.
> I am going to attach a PDF document with a lot more details especially addressing how
is this different from bulk loads in relational DBs or materialized views.
> Looking forward to hear if others see a similar need to formalize conversion of data
to different storage formats.  If yes, are the details in the PDF document a good start ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message