hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaibhav Aggarwal (JIRA)" <>
Subject [jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive
Date Tue, 21 Sep 2010 03:46:33 GMT


Vaibhav Aggarwal commented on HIVE-1620:

I tried to change create a new class S3FileSinkOperator but there seems to be a lot of complexity
involved in extending the existing FileSinkOperator class.

Most of the changes that are required are in createBucketFiles() method. Overriding that method
will lead to a lot of repeated code which would be very hard to maintain. That method needs
to be refactored into smaller methods in order to extend FileSinkOperator. I should be able
to do it but that seemed to defeat the purpose of not changing FileSinkOperator  much. Please
let me know if you are OK with refactoring the FileSinkOperator class into smaller methods.

Based on my investigations I still feel that the current approach is better. You would notice
that there are very few changes to the FileSinkOperator in the current patch.
I have just introduced a new variable "fsSupportsMove" which is always parallel to isNativeTable
(an existing boolean variable).
The only reason I choose not to reuse isNativeTable variable is to allow the functionality
of non-native tables to grow independent of the file systems not supporting move.

Please review the patch one more time considering the above argument and let me know which
approach do you think is best.


> Patch to write directly to S3 from Hive
> ---------------------------------------
>                 Key: HIVE-1620
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-1620.patch
> We want to submit a patch to Hive which allows user to write files directly to S3.
> This patch allow user to specify an S3 location as the table output location and hence
eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message