hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16686) repli invocations of distcp needs additional handling
Date Mon, 22 May 2017 09:53:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sushanth Sowmyan updated HIVE-16686:
------------------------------------
    Release Note: 
This introduces parsing of additional parameters that are not directly used by hive, but are
passed on to distcp when hive invokes it. We now introduce the ability to use the hive command
to do "set" commands to pass along cli arguments to distcp.

Any parameter set as "set distcp.options.blah=''" will result in an extra "-blah" argument
going into distcp, as well as any parameter set as "set distcp.options.foo='bar'" will result
in an extra "-foo bar" argument going to distcp.

Currently, we always pass along "-update" and "-skipcrccheck" to distcp - that is retained
as defaults if no distcp.options.* params are found. If they are found, then these options
are not added by default, letting the user instead provide an excplicit list.

Note that all of these properties affect how distcp runs when it is launched by hive, but
are not directly hive settings. Instead, hive will allow setting them through the use of the
"set" command.

  was:
This introduces parsing of additional parameters that are not directly used by hive, but are
passed on to distcp when hive invokes it. We now introduce the ability to use the hive command
to do "set" commands to pass along cli arguments to distcp.

Any parameter set as "set distcp.options.blah=''" will result in an extra "-blah" argument
going into distcp, as well as any parameter set as "set distcp.options.foo='bar'" will result
in an extra "-foo bar" argument going to distcp.

Currently, we always pass along "-update" and "-skipcrccheck" to distcp - that is retained
as defaults if no distcp.options.* params are found. If they are found, then these options
are not added by default, letting the user instead provide an excplicit list.

In addition, one new special option parameter, "distcp.option.privilegedUser"  is being added
as a special option that is not passed along to distCp. Instead, this option is used to make
sure that hive will run distcp inside a impersonation context as that specified user, if this
parameter is specified, and the user being impersonated is different from the current user.
This, however, will require that the user have impersonation proxy privileges(something that
a HS2 instance typically will have, but not a regular end-user).

Note that all of these properties affect how distcp runs when it is launched by hive, but
are not directly hive settings. Instead, hive will allow setting them through the use of the
"set" command.


> repli invocations of distcp needs additional handling
> -----------------------------------------------------
>
>                 Key: HIVE-16686
>                 URL: https://issues.apache.org/jira/browse/HIVE-16686
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>              Labels: TODOC3.0
>         Attachments: HIVE-16686.1.patch, HIVE-16686.2.patch
>
>
> When REPL LOAD invokes distcp, there needs to be a way for the user invoking REPL LOAD
to pass on arguments to distcp. In addition, there is sometimes a need for distcp to be invoked
from within an impersonated context, such as running as user "hdfs", asking distcp to preserve
ownerships of individual files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message