hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anishek (JIRA)" <>
Subject [jira] [Commented] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
Date Wed, 21 Jun 2017 06:00:00 GMT


anishek commented on HIVE-16918:

I think HIVE_IN_TEST is definitely better indicator. However I was thinking that any method
on pfile implementation should first check for "HIVE_IN_TEST" and everywhere else we can just
do the pfile/file scheme check. this way we wont be using HIVE_IN_TEST in various classes
as it will be limited to only the proxyfilesystem class and we use the pfile as a regular
scheme everywhere. What do you think ?

> Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
> --------------------------------------------------------------------------
>                 Key: HIVE-16918
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 3.0.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-16918.2.patch, HIVE-16918.patch
> With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. This, however,
is incorrect for copying _metadata generated from a temporary scratch directory to hdfs. We
need to change that so that routes to using a regular CopyTask. The issue with using distcp
for this is that distcp launches from another job which may be queued on another machine,
which does not have access to this file:// uri. Distcp should only ever be used when copying
from non-localfilesystems.
> Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a default for
invocations of distcp from hive. Adding that in. This would not be necessary if HADOOP-8143
had made it in, but till it doesn't go in, we need it.

This message was sent by Atlassian JIRA

View raw message