hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vihang Karajgaonkar <vih...@cloudera.com>
Subject Re: Review Request 52283: HIVE-14822 : Add support for credential provider for jobs launched from Hiveserver2
Date Wed, 12 Oct 2016 17:44:32 GMT


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > common/src/java/org/apache/hadoop/hive/common/EnvironmentUtils.java, line 25
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523416#file1523416line25>
> >
> >     Is this actually mocked anywhere in the tests ? I see the tests mock the env
via a HashMap.
> >     
> >     If the use of this interface is just to mock, I'm not sure if there is a good
reason to use in the non-test code. Could we just directly use System.getenv there ? i.e.
we can get rid of this file altogether.

Initially I had thought of using this interface to mock Environment variables. But found a
easier alternative later for tests. Removed it.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java, line 140
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523420#file1523420line140>
> >
> >     "needs to be " -> "is" ?

What I tried to convey is the password file needs to be readable by the consumers. Changed
it to "which needs to be readable by all consumers" to be more clear. Let me know if you have
any other suggestions


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java, lines 146-151
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523420#file1523420line146>
> >
> >     Could you make the following clear:
> >     
> >     Either:
> >     (1) HIVE_SERVER2_JOB_CREDSTORE_LOCATION should be configured AND HIVE_JOB_CREDSTORE_PASSWORD
environment should be set.
> >     OR
> >     (2) HADOOP_CREDSTORE_PASSWORD environment should be set.
> >     
> >     IOW It's important to state that HIVE_SERVER2_JOB_CREDSTORE_LOCATION and HADOOP_CREDSTORE_PASSWORD
do not work together.

HIVE_SERVER2_JOB_CREDSTORE_LOCATION does work with HADOOP_CREDSTORE_PASSWORD if the user wishes
to use it that way. The code actually uses HIVE_JOB_CREDSTORE_PASSWORD if it is set otherwise
uses HADOOP_CREDSTORE_PASSWORD. It checks for HIVE_JOB_CREDSTORE_PASSWORD only when HIVE_SERVER2_JOB_CREDSTORE_LOCATION
is set. If both HADOOP_CREDSTORE_PASSWORD and HIVE_JOB_CREDSTORE_PASSWORD are not set, it
is still a valid configuration since Hadoop credential provider will use the default password
of "none" in that case. I have changed the wording to explicitly convey this. Hopefully it
is clearer now.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java, line 179
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523420#file1523420line179>
> >
> >     If both HIVE_JOB_CREDSTORE_PASSWORD and HADOOP_CREDSTORE_PASSWORD are not set,
what happens ? Should we throw an exception ?
> >     
> >     What if HIVE_JOB_CREDSTORE_PASSWORD and HADOOP_CREDSTORE_PASSWORD are not set,
but HIVE_SERVER2_JOB_CREDSTORE_LOCATION is set ?

It is still a valid configuration since hadoop credential provider uses the default password
of "none" in that case. I have add that in the method documentation above.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java, lines 203-204
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523420#file1523420line203>
> >
> >     Wondering if this should really be a RuntimeException, since query is pretty
much hosed at this point, correct ? i.e. rollback the stack and abandon this query with an
exception.

You are right. There is no point in continuing since the query will fail anyways. Changed
it to throw RuntimeException


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java, lines
201-202
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523423#file1523423line201>
> >
> >     Needs space after //
> >     
> >     What happens to HIVE_SERVER2_JOB_CREDSTORE_LOCATION ? Does that need to be set
or checked if it is set ? I mean, do both HIVE_SERVER2_JOB_CREDSTORE_PASSWORD_ENVVAR and 
HIVE_SERVER2_JOB_CREDSTORE_LOCATION need to be set for former to be used ?
> >     
> >     It's worth clarifying here how Spark implementation differs from MR. I am assuming
this step is in addition to the updateJobCredentialProviders invocation in Spark.

HIVE_SERVER2_JOB_CREDSTORE_PASSWORD_ENVVAR does not need to be set in order for HIVE_SERVER2_JOB_CREDSTORE_LOCATION
to work like explained in one of my comments above. The difference in Spark implementation
and MR in this case is the environment variables for Spark are set using SparkConf Map in
the HiveSparkClientFactory unlike MR. The HIVE_SERVER2_JOB_CREDSTORE_LOCATION is set directly
in the jobConfig similar to MR in the execute method of LocalSparkClient and RemoteSparkClient.
I have updated the comments to say the same.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java, line 188
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523420#file1523420line188>
> >
> >     Is this ok if this is null or empty ?

I did some more investigation I made some changes to the method. It gets rid of the return
type since it was not really needed in the first place.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java, lines 422-426
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523422#file1523422line422>
> >
> >     Easier: job.set(Constants.HADOOP_CREDENTIAL_PROVIDER_PATH_CONFIG, origKeystoreLocation
== null: "", origKeystoreLocation)
> >     
> >     Wait, shouldn't we set the config before submitting the job ?

This check is removed now since it was not needed in the first place.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java, lines 793-794
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523422#file1523422line793>
> >
> >     are you fixing something that was broken ? JOBCONF_FILENAME was earlier local
but you to want to allow it on any fs ?

I wanted to move the jobConf.xml to HDFS from the local filesystem since I thought it would
be more secure. But I have now revert these changes since it had some other issues which I
discovered later. Also, it is not actually needed since jobConf.xml does not contain anything
different than earlier (before this patch). The password are injected in the jobConf in the
runtime using environment variables so it is safe to revert back to original implementation
of this method.


> On Oct. 6, 2016, 8:56 p.m., Mohit Sabharwal wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/exec/TestHiveCredentialProviders.java, line
103
> > <https://reviews.apache.org/r/52283/diff/4/?file=1523426#file1523426line103>
> >
> >     Are we testing the case where both HADOOP_CREDENTIAL_PASSWORD_ENVVAR and HIVE_JOB_CREDSTORE_PASSWORD_ENVVAR_VAL
are not set ?

I havent yet tested this case, I will test it and add a test case if possible.


- Vihang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52283/#review151693
-----------------------------------------------------------


On Oct. 12, 2016, 5:44 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52283/
> -----------------------------------------------------------
> 
> (Updated Oct. 12, 2016, 5:44 p.m.)
> 
> 
> Review request for hive, Mohit Sabharwal and Sergio Pena.
> 
> 
> Bugs: HIVE-14822
>     https://issues.apache.org/jira/browse/HIVE-14822
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This change adds support for credential provider for jobs launched from HiveServer2.
It also adds support for job-specific credential provider and password which is passed to
the jobs via the job configs when they are launched from HS2. The hive specific credential
provider location is specified by a new configuration property specific to hiveserver2 and
the password to job credential provider is provided using the environment variable
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/FileUtils.java 3ed2d086fd8e14b9a70550c02082c1456f905a08

>   common/src/java/org/apache/hadoop/hive/conf/Constants.java 77c6aa5663c2c5358715801bc039c0fe8035e3dc

>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43a16d7fed1d38400793e7c96a7febebe42d351b

>   common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java 16c2eafdb6888ed62168dd00f76335fa2484753b

>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java cea9582c8ccb0c79700ac7913735d4cdf52f0c7e

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 784b9c9fa769eeb088e3a6408a0b29147a8b4086

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java c75333d7022f776aab184a4b804fd7ad7befac64

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java a9f70c4100219c8929abd4e31b30d340e6ba98bd

>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestHiveCredentialProviders.java PRE-CREATION

>   spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 936fdafdb37c461a7a5deb97cba72d4db54a49e1

> 
> Diff: https://reviews.apache.org/r/52283/diff/
> 
> 
> Testing
> -------
> 
> Testing running multiple queries on S3 with keys stored in a credential provider
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message