hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory
Date Tue, 07 Mar 2017 11:05:37 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899249#comment-15899249
] 

Lefty Leverenz commented on HIVE-15121:
---------------------------------------

Sergio Peña documented *hive.blobstore.optimizations.enabled* in a new Blobstore section
of Hive Configuration Properties:

* [Configuration Properties -- Blobstore (i.e. Amazon S3) | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Blobstore(i.e.AmazonS3)]
* [Configuration Properties -- hive.blobstore.optimizations.enabled | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.blobstore.optimizations.enabled]

Removed the TODOC2.2 label.

> Last MR job in Hive should be able to write to a different scratch directory
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15121
>                 URL: https://issues.apache.org/jira/browse/HIVE-15121
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, HIVE-15121.3.patch, HIVE-15121.patch,
HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, but the final
MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that for a multi-job
query, all intermediate MR jobs write to HDFS, and then the final job writes to S3. Writing
to HDFS should be faster than writing to S3, so it makes more sense to write intermediate
data to HDFS.
> The advantage is that any copying of data that needs to be done from the scratch directory
to the final table directory can be done server-side, within the blobstore. The MoveTask simply
renames data from the scratch directory to the final table location, which should translate
to a server-side COPY request. This way HiveServer2 doesn't have to actually copy any data,
it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message