hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <>
Subject Re: Hive Context: Hive Metastore Client
Date Wed, 09 Mar 2016 15:58:32 GMT
One way people have gotten around the lack of LDAP connectivity in HS2 has been to use Apache
Knox.  That project’s goal is to provide a single login capability for Hadoop related projects
so that users can tie their LDAP or Active Directory servers into Hadoop.


> On Mar 8, 2016, at 16:00, Mich Talebzadeh <> wrote:
> The current scenario resembles a three tier architecture but without the security of
second tier. In a typical three-tier you have users connecting to the application server (read
Hive server2) are independently authenticated and if OK, the second tier creates new ,NET
type or JDBC threads to connect to database much like multi-threading. The problem I believe
is that Hive server 2 does not have that concept of handling the individual loggings yet.
Hive server 2 should be able to handle LDAP logins as well. It is a useful layer to have.
> Dr Mich Talebzadeh
> LinkedIn
> On 8 March 2016 at 23:28, Alex <> wrote:
> Yes, when creating a Hive Context a Hive Metastore client should be created with a user
that the Spark application will talk to the *remote* Hive Metastore with. We would like to
add a custom authorization plugin to our remote Hive Metastore to authorize the query requests
that the spark application is submitting which would also add authorization for any other
applications hitting the Hive Metastore. Furthermore we would like to extend this so that
we can submit "jobs" to our Spark application that will allow us to run against the metastore
as different users while leveraging the abilities of our spark cluster. But as you mentioned
only one login connects to the Hive Metastore is shared among all HiveContext sessions.
> Likely the authentication would have to be completed either through a secured Hive Metastore
(Kerberos) or by having the requests go through HiveServer2.
> --Alex
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>> Hi,
>> What do you mean by Hive Metastore Client? Are you referring to Hive server login
much like beeline?
>> Spark uses hive-site.xml to get the details of Hive metastore and the login to the
metastore which could be any database. Mine is Oracle and as far as I know even in  Hive 2,
hive-site.xml has an entry for javax.jdo.option.ConnectionUserName that specifies username
to use against metastore database. These are all multi-threaded JDBC connections to the database,
the same login as shown below:
>> LOGIN    SID/serial# LOGGED IN S HOST       OS PID         Client PID     PROGRAM
              MEM/KB      Logical I/O Physical I/O ACT
>> -------- ----------- ----------- ---------- -------------- -------------- ---------------
------------ ---------------- ------------ ---
>> -------
>> HIVEUSER 67,6160     08/03 08:11 rhes564    oracle/20539   hduser/1234    JDBC Thin
Clien        1,017               37            0 N
>> HIVEUSER 89,6421     08/03 08:11 rhes564    oracle/20541   hduser/1234    JDBC Thin
Clien        1,081              528            0 N
>> HIVEUSER 112,561     08/03 10:45 rhes564    oracle/24624   hduser/1234    JDBC Thin
Clien          889               37            0 N
>> HIVEUSER 131,8811    08/03 08:11 rhes564    oracle/20543   hduser/1234    JDBC Thin
Clien        1,017               37            0 N
>> HIVEUSER 47,30114    08/03 10:45 rhes564    oracle/24626   hduser/1234    JDBC Thin
Clien        1,017               37            0 N
>> HIVEUSER 170,8955    08/03 08:11 rhes564    oracle/20545   hduser/1234    JDBC Thin
Clien        1,017              323            0 N
>> As I understand what you are suggesting is that each Spark user uses different login
to connect to Hive metastore. As of now there is only one login that connects to Hive metastore
shared among all
>> 2016-03-08T23:08:01,890 INFO  [pool-5-thread-72]: HiveMetaStore.audit (
- ugi=hduser      ip=       cmd=source: get_table : db=test tbl=t
>> 2016-03-08T23:18:10,432 INFO  [pool-5-thread-81]: HiveMetaStore.audit (
- ugi=hduser      ip=       cmd=source: get_tables: db=asehadoop
>> And this is an entry in Hive log when connection is made theough Zeppelin UI
>> 2016-03-08T23:20:13,546 INFO  [pool-5-thread-84]: metastore.HiveMetaStore (
- 84: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 2016-03-08T23:20:13,547 INFO  [pool-5-thread-84]: metastore.ObjectStore (
- ObjectStore, initialize called
>> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: metastore.MetaStoreDirectSql (<init>(142))
- Using direct SQL, underlying DB is ORACLE
>> 2016-03-08T23:20:13,550 INFO  [pool-5-thread-84]: metastore.ObjectStore (
- Initialized ObjectStore
>> I am not sure there is currently such plan to have different logins allowed to Hive
Metastore. But it will add another level of security. Though I am not sure how this would
be authenticated.
>> HTH
>> Dr Mich Talebzadeh
>> LinkedIn
>> On 8 March 2016 at 22:23, Alex F <> wrote:
>> As of Spark 1.6.0 it is now possible to create new Hive Context sessions sharing
various components but right now the Hive Metastore Client is shared amongst each new Hive
Context Session.
>> Are there any plans to create individual Metastore Clients for each Hive Context?
>> Related to the question above are there any plans to create an interface for customizing
the username that the Metastore Client uses to connect to the Hive Metastore? Right now it
either uses the user specified in an environment variable or the application's process owner.


View raw message