hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
Date Fri, 11 Aug 2017 14:00:05 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051
] 

Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 1:59 PM:
------------------------------------------------------------------

Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config
hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs
if lazy copy is true and null if false. This is just to avoid passing this argument from multiple
flows and also, the incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of files or large
size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation
which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual
data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load.
- No tests added as the existing tests itself will cover the changes except distcp flow (due
to hive.in.test) which needs to be tested manually.

Request [~thejas]/[~daijy] to please review it!


was (Author: sankarh):
Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config
hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs
if lazy copy is true and null if false. This is just to avoid passing this argument from multiple
flows and also, the incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of files or large
size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation
which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual
data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load.
- No tests added as the existing tests itself will cover the changes except distcp flow (due
to hive.in.test) which needs to be tested manually.

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> ---------------------------------------------------------------------
>
>                 Key: HIVE-17289
>                 URL: https://issues.apache.org/jira/browse/HIVE-17289
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, Export, Import, replication
>             Fix For: 3.0.0
>
>         Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT uses distcp
to copy the larger files/large number of files from dump directory to table staging directory.
But, this copy fails as distcp is always done with doAs user specified in hive.distcp.privileged.doAs,
which is "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to "hive" as "hdfs"
super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message