hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Poepping (JIRA)" <>
Subject [jira] [Commented] (HIVE-1620) Patch to write directly to S3 from Hive
Date Wed, 21 Dec 2016 13:12:59 GMT


Thomas Poepping commented on HIVE-1620:

Hi Sahil,
Yes, direct write works well in production. There are definitely some difficult design decisions
to be made, and as you say, there is no great solution to clean up after failure. Some other
issues are: self-referencing insert overwrite data loss, metadata loss in dynamic partitioning,
no good visibility of partial results. There are workarounds / best practices for these, though.
We are happy to engage in conversation about pros and cons.
The biggest thing we would like to stress with these implementations is that they should be
pluggable. The solution should be as generic as possible to avoid spaghetti code.
We think the best solution is to make this a conversation about the best design. We are happy
to participate in a community design and implementation, drawing on our experience with these
types of issues.

> Patch to write directly to S3 from Hive
> ---------------------------------------
>                 Key: HIVE-1620
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-1620.patch
> We want to submit a patch to Hive which allows user to write files directly to S3.
> This patch allow user to specify an S3 location as the table output location and hence
eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

This message was sent by Atlassian JIRA

View raw message