hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15215) Investigate if staging data on S3 can always go under the scratch dir
Date Thu, 17 Nov 2016 22:58:58 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sahil Takiar updated HIVE-15215:
--------------------------------
    Summary: Investigate if staging data on S3 can always go under the scratch dir  (was:
Files on S3 are deleted one by one in INSERT OVERWRITE queries)

> Investigate if staging data on S3 can always go under the scratch dir
> ---------------------------------------------------------------------
>
>                 Key: HIVE-15215
>                 URL: https://issues.apache.org/jira/browse/HIVE-15215
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Sahil Takiar
>
> When running {{INSERT OVERWRITE}} queries the files to overwrite are deleted one by one.
The reason is that, by default, hive.exec.stagingdir is inside the target table directory.
> Ideally Hive would just delete the entire table directory, but it can't do that since
the staging data is also inside the directory. Instead it deletes each file one-by-one, which
is very slow.
> There are a few ways to fix this:
> 1: Move the staging directory outside the table location. This can be done by  setting
hive.exec.stagingdir to a different location when running on S3. It would be nice if users
didn't have to explicitly set this when running on S3 and things just worked out-of-the-box.
My understanding is that hive.exec.stagingdir was only added to support HDFS encryption zones.
Since S3 doesn't have encryption zones, there should be no problem with using the value of
hive.exec.scratchdir to store all intermediate data instead.
> 2: Multi-thread the delete operations
> 3: See if the {{S3AFileSystem}} can expose some type of bulk delete op



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message