hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Poepping (Jira)" <>
Subject [jira] [Updated] (HIVE-22928) Allow hive.exec.stagingdir to be a fully qualified directory name
Date Wed, 17 Jun 2020 17:18:00 GMT


Thomas Poepping updated HIVE-22928:
    Attachment: HIVE-22928.6.patch

> Allow hive.exec.stagingdir to be a fully qualified directory name
> -----------------------------------------------------------------
>                 Key: HIVE-22928
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Configuration, Hive
>    Affects Versions: 3.1.2
>            Reporter: Thomas Poepping
>            Assignee: Thomas Poepping
>            Priority: Minor
>         Attachments: HIVE-22928.2.patch, HIVE-22928.3.patch, HIVE-22928.4.patch, HIVE-22928.5.patch,
HIVE-22928.6.patch, HIVE-22928.patch
> Currently, {{hive.exec.stagingdir}} can only be set as a relative directory name that,
for operations like {{insert}} or {{insert overwrite}}, will be placed either under the table
directory or the partition directory. 
> For cases where an HDFS cluster is small but the data being inserted is very large (greater
than the capacity of the HDFS cluster, as mentioned in a comment by [~ashutoshc] on [HIVE-14270]),
the client may want to set their staging directory to be an explicit blobstore path (or any
filesystem path), rather than relying on Hive to intelligently build the blobstore path based
on an interpretation of the job. We may lose locality guarantees, but because renames are
just as expensive on blobstores no matter what the prefix is, this isn't considered a terribly
large loss (assuming only blobstore customers use this functionality).
> Note that {{}} doesn't actually suffice in
this case, as the stagingdir is not the same.
> This commit enables Hive customers to set an absolute location for all staging directories.
For instances where the configured stagingdir scheme is not the same as the scheme for the
table location, the default stagingdir configuration is used. This avoids a cross-filesystem
rename, which is impossible anyway.

This message was sent by Atlassian Jira

View raw message