Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9514B18275 for ; Wed, 9 Mar 2016 00:00:47 +0000 (UTC) Received: (qmail 61128 invoked by uid 500); 9 Mar 2016 00:00:46 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 61062 invoked by uid 500); 9 Mar 2016 00:00:46 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 61051 invoked by uid 99); 9 Mar 2016 00:00:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2016 00:00:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9CDD71A0C40 for ; Wed, 9 Mar 2016 00:00:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.487 X-Spam-Level: * X-Spam-Status: No, score=1.487 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_FONT_FACE_BAD=0.289, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JKnJTC8wea8R for ; Wed, 9 Mar 2016 00:00:42 +0000 (UTC) Received: from mail-vk0-f47.google.com (mail-vk0-f47.google.com [209.85.213.47]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id E06905F56E for ; Wed, 9 Mar 2016 00:00:41 +0000 (UTC) Received: by mail-vk0-f47.google.com with SMTP id c3so37171047vkb.3 for ; Tue, 08 Mar 2016 16:00:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=jIhoAdFrPyw0V4L8bM4kPMwOf/4lX5g6pRpEzZ9Gxpc=; b=XMfFYizNXXgvfLuTHuA2JvAN0Q3zPV2IpKxHjP4j/iIbWRqXfT37CfjgFAs/wu2i9f enWb6uahdjTsRVG95Njpw+0pnhTd2jNn1+U+OWnCykbSVmj2LPhz0ShYamJ/z6A6iCck Q5KRlHIfTf9gX3168lQuEFmfYP5YkFRex72CwQM0mq3/6QkYn4fv2UbnNBdmFzfqrz09 f3Bl2VPCC2mtGP0rHpJHADvAcziqzS6+cNz+SOqzUuHjuU71D0tJa6aCjswuWIHz7Ioj AiemULPGs7Mum8kOCuqJAkouY9M56l/G10jRY4cdLxFoyI+699l57y6LthaJVy5FBSN4 qKXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=jIhoAdFrPyw0V4L8bM4kPMwOf/4lX5g6pRpEzZ9Gxpc=; b=MPlke0qIXX/SKktZ9D/qjwLjqTa822qtF1Cw02VQSfs17rwbTTagaz5QX0pVETsu+1 vw12c6ycr72Eq70xK8velDBk9/ovA0XkKsidhKg1ERS2I7TJB6sLAwes3wI7tDL0twuO lHsLvMHmOF9Mvfwda+o/LBgMgZ8A+Ks4Q+jIL+BhvltcNg5/YJEe5V20h+ZyX3IMIo9F 7+MSjolAI1P5w/I2SqswAsMQabYroljMHRPZJIdLuNjQ4hVmEk+stAugiNiyMzmWyk5Q vaXqoV4hWYMjxtgA3KVrMT2c4AuQa9Lek0PrjYgxfqr9CwG2iE3xJYjwI52c7SD1euA8 xrQg== X-Gm-Message-State: AD7BkJIwb4NpN6t4ulDOsLf/Iqh2zg5CtG4ORHBa1tn7mCbUK7xdvMFqIQ+nI42buwmLbo84GgbSloADMglIag== MIME-Version: 1.0 X-Received: by 10.31.58.193 with SMTP id h184mr28728665vka.111.1457481640882; Tue, 08 Mar 2016 16:00:40 -0800 (PST) Received: by 10.31.128.213 with HTTP; Tue, 8 Mar 2016 16:00:40 -0800 (PST) In-Reply-To: <56DF6014.9090604@gmail.com> References: <56DF6014.9090604@gmail.com> Date: Wed, 9 Mar 2016 00:00:40 +0000 Message-ID: Subject: Re: Hive Context: Hive Metastore Client From: Mich Talebzadeh To: Alex Cc: "user@spark" , user@hive.apache.org Content-Type: multipart/alternative; boundary=001a1143ff7078ba04052d9263ce --001a1143ff7078ba04052d9263ce Content-Type: text/plain; charset=UTF-8 The current scenario resembles a three tier architecture but without the security of second tier. In a typical three-tier you have users connecting to the application server (read Hive server2) are independently authenticated and if OK, the second tier creates new ,NET type or JDBC threads to connect to database much like multi-threading. The problem I believe is that Hive server 2 does not have that concept of handling the individual loggings yet. Hive server 2 should be able to handle LDAP logins as well. It is a useful layer to have. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 8 March 2016 at 23:28, Alex wrote: > Yes, when creating a Hive Context a Hive Metastore client should be > created with a user that the Spark application will talk to the *remote* > Hive Metastore with. We would like to add a custom authorization plugin to > our remote Hive Metastore to authorize the query requests that the spark > application is submitting which would also add authorization for any other > applications hitting the Hive Metastore. Furthermore we would like to > extend this so that we can submit "jobs" to our Spark application that will > allow us to run against the metastore as different users while leveraging > the abilities of our spark cluster. But as you mentioned only one login > connects to the Hive Metastore is shared among all HiveContext sessions. > > Likely the authentication would have to be completed either through a > secured Hive Metastore (Kerberos) or by having the requests go through > HiveServer2. > > --Alex > > > On 3/8/2016 3:13 PM, Mich Talebzadeh wrote: > > Hi, > > What do you mean by Hive Metastore Client? Are you referring to Hive > server login much like beeline? > > Spark uses hive-site.xml to get the details of Hive metastore and the > login to the metastore which could be any database. Mine is Oracle and as > far as I know even in Hive 2, hive-site.xml has an entry for > javax.jdo.option.ConnectionUserName that specifies username to use against > metastore database. These are all multi-threaded JDBC connections to the > database, the same login as shown below: > > LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID > PROGRAM MEM/KB Logical I/O Physical I/O ACT > -------- ----------- ----------- ---------- -------------- -------------- > --------------- ------------ ---------------- ------------ --- > INFO > ------- > HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 > JDBC Thin Clien 1,017 37 0 N > HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 > JDBC Thin Clien 1,081 528 0 N > HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 > JDBC Thin Clien 889 37 0 N > HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 > JDBC Thin Clien 1,017 37 0 N > HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 > JDBC Thin Clien 1,017 37 0 N > HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 > JDBC Thin Clien 1,017 323 0 N > > As I understand what you are suggesting is that each Spark user uses > different login to connect to Hive metastore. As of now there is only one > login that connects to Hive metastore shared among all > > 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit > (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser > ip=50.140.197.217 cmd=source:50.140.197.217 get_table : db=test tbl=t > 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit > (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser > ip=50.140.197.216 cmd=source:50.140.197.216 get_tables: db=asehadoop > pat=.* > > And this is an entry in Hive log when connection is made theough Zeppelin > UI > > 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: metastore.HiveMetaStore > (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with > implementation class:org.apache.hadoop.hive.metastore.ObjectStore > 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStore > (ObjectStore.java:initialize(318)) - ObjectStore, initialize called > 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: > metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:(142)) - Using > direct SQL, underlying DB is ORACLE > 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStore > (ObjectStore.java:setConf(301)) - Initialized ObjectStore > > I am not sure there is currently such plan to have different logins > allowed to Hive Metastore. But it will add another level of security. > Though I am not sure how this would be authenticated. > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > * > > > > http://talebzadehmich.wordpress.com > > > > On 8 March 2016 at 22:23, Alex F wrote: > >> As of Spark 1.6.0 it is now possible to create new Hive Context sessions >> sharing various components but right now the Hive Metastore Client is >> shared amongst each new Hive Context Session. >> >> Are there any plans to create individual Metastore Clients for each Hive >> Context? >> >> Related to the question above are there any plans to create an interface >> for customizing the username that the Metastore Client uses to connect to >> the Hive Metastore? Right now it either uses the user specified in an >> environment variable or the application's process owner. >> > > > --001a1143ff7078ba04052d9263ce Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The current scenario resembles=C2=A0a three tier architect= ure but without the security of second tier. In a typical three-tier you ha= ve users connecting to the application server (read Hive server2) are=C2=A0= independently authenticated and if OK, the second tier creates new ,NET typ= e or JDBC threads to connect to=C2=A0database much like multi-threading. Th= e problem I believe is that Hive server 2 does not have that concept of han= dling the individual loggings yet. Hive server 2 should be able to handle L= DAP logins as well. It is a useful layer to have.


On 8 March 2016 at 23:28, Alex <this.side.of.confusion@gmail.com> wrote:
=20 =20 =20
Yes, when creating a Hive Context a Hive Metastore client should be created with a user that the Spark application will talk to the *remote* Hive Metastore with. We would like to add a custom authorization plugin to our remote Hive Metastore to authorize the query requests that the spark application is submitting which would also add authorization for any other applications hitting the Hive Metastore. Furthermore we would like to extend this so that we can submit "jobs" to our Spark application that will allow us to = run against the metastore as different users while leveraging the abilities of our spark cluster. But as you mentioned only one login connects to the Hive Metastore is shared among all HiveContext sessions.

Likely the authentication would have to be completed either through a secured Hive Metastore (Kerberos) or by having the requests go through HiveServer2.

--Alex


On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
Hi,

What do you mean by Hive Metastore Client? Are you referring to Hive server login much like beeline?

Spark uses hive-site.xml to get the details of Hive metastore and the login to the metastore which could be any database. Mine is Oracle and as far as I know even in =C2=A0Hive = 2, hive-site.xml has an entry for javax.jdo.option.ConnectionUserName that specifies username to use against metastore database. These are all multi-threaded JDBC connections to the database, the same login as shown below:

LOGIN=C2= =A0=C2=A0=C2=A0 SID/serial# LOGGED IN S HOST=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 OS PID=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Client PID=C2=A0=C2=A0=C2=A0=C2=A0 PROGRAM=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 MEM/KB=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 Logical I/O Physical I/O ACT
-------- ----------- ----------- ---------- -------------- -------------- --------------- ------------ ---------------- ------------ ---
INFO
-------
HIVEUSER 67,6160=C2=A0=C2=A0=C2=A0=C2=A0 08/03 08:11 rhes564=C2= =A0=C2=A0=C2=A0 oracle/20539=C2=A0=C2=A0 hduser/1234=C2=A0=C2=A0=C2=A0 JDBC Thin Clien=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 1,017=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 37=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0 N
HIVEUSER 89,6421=C2=A0=C2=A0=C2=A0=C2=A0 08/03 08:11 rhes564=C2=A0=C2=A0= =C2=A0 oracle/20541=C2=A0=C2=A0 hduser/1234=C2=A0=C2=A0=C2=A0 JDBC Thin Clien=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 1,081=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 528=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0 N
HIVEUSER 112,561=C2=A0=C2=A0=C2=A0=C2=A0 08/03 10:45 rhes564=C2=A0=C2=A0= =C2=A0 oracle/24624=C2=A0=C2=A0 hduser/1234=C2=A0=C2=A0=C2=A0 JDBC Thin Clien=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 889=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 37=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0 N
HIVEUSER 131,8811=C2=A0=C2=A0=C2=A0 08/03 08:11 rhes564=C2=A0=C2=A0=C2= =A0 oracle/20543=C2=A0=C2=A0 hduser/1234=C2=A0=C2=A0=C2=A0 JDBC Thin Clien=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 1,017=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 37=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0 N
HIVEUSER 47,30114=C2=A0=C2=A0=C2=A0 08/03 10:45 rhes564=C2=A0=C2=A0=C2= =A0 oracle/24626=C2=A0=C2=A0 hduser/1234=C2=A0=C2=A0=C2=A0 JDBC Thin Clien=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 1,017=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 37=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0 N
HIVEUSER 170,8955=C2=A0=C2=A0=C2=A0 08/03 08:11 rhes564=C2=A0=C2=A0=C2= =A0 oracle/20545=C2=A0=C2=A0 hduser/1234=C2=A0=C2=A0=C2=A0 JDBC Thin Clien=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 1,017=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 323=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0 N

As I understand what you are suggesting is that each Spark user uses different login to connect to Hive metastore. As of now there is only one login that connects to Hive metastore shared among all

2016-03-0= 8T23:08:01,890 INFO=C2=A0 [pool-5-thread-72]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=3Dhduser=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 ip=3D50.140.197.217=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cmd=3Ds= ource:50.140.197.217 get_table : db=3Dtest tbl=3Dt
2016-03-0= 8T23:18:10,432 INFO=C2=A0 [pool-5-thread-81]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=3Dhduser=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 ip=3D50.140.197.216=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cmd=3Ds= ource:50.140.197.216 get_tables: db=3Dasehadoop pat=3D.*

And this is an entry in Hive log when connection is made theough Zeppelin UI

2016-03-0= 8T23:20:13,546 INFO=C2=A0 [pool-5-thread-84]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2016-03-08T23:20:13,547 INFO=C2=A0 [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
2016-03-08T23:20:13,550 INFO=C2=A0 [pool-5-thread-84]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL, underlying DB is ORACLE
2016-03-08T23:20:13,550 INFO=C2=A0 [pool-5-thread-84]: metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore

I am not sure there is currently such plan to have different logins allowed to Hive Metastore. But it will add another level of security. Though I am not sure how this would be authenticated.

HTH

=C2=A0


On 8 March 2016 at 22:23, Alex F <this.side.of.confusion@gmail.com> wrote:
As of Spark 1.6.0 it is now possible to create new Hive Context sessions sharing various components but right now the Hive Metastore Client is shared amongst each new Hive Context Session.

Are there any plans to create individual Metastore Clients for each Hive Context?

Related to the question above are there any plans to create an interface for customizing the username that the Metastore Client uses to connect to the Hive Metastore? Right now it either uses the user specified in an environment variable or the application's process owner. =



--001a1143ff7078ba04052d9263ce--