Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7490A18C5F for ; Wed, 9 Mar 2016 16:21:04 +0000 (UTC) Received: (qmail 17703 invoked by uid 500); 9 Mar 2016 16:21:02 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 17629 invoked by uid 500); 9 Mar 2016 16:21:02 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 17619 invoked by uid 99); 9 Mar 2016 16:21:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2016 16:21:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 5438BC0EC3 for ; Wed, 9 Mar 2016 16:21:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Pa9nFu5Z0J7C for ; Wed, 9 Mar 2016 16:20:59 +0000 (UTC) Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com [209.85.213.48]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 9CE965FAD6 for ; Wed, 9 Mar 2016 16:20:58 +0000 (UTC) Received: by mail-vk0-f48.google.com with SMTP id e6so61668752vkh.2 for ; Wed, 09 Mar 2016 08:20:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=M/cLgi+gVvY42hsxJLhypRISueb/0uWeejXC2wh0DqM=; b=FEUaWTeSG/pevEpcKIABI8XU/w41fZgGbf5oOMHzh8ogvWD/dEOPDGliVbWd3bcTFA nLNbMfjUhWiVzROrwXq3sFC1d74Jq+M8DgP/rmC2l4lrqU+swwcGa6L+jbc31wQs/Uz1 6s8Jrw8NkyFRLRwcAkdbi1k10mfEVuOJ8mafy0nquYAqrqz62SKOiispBpxlfIwcfB3O yvVz64ylDNB2QpBmsyX8Zubxbc7lSHgDUJMB0tbSiy3m1efkrNZ8w1AaYCmiGPEikSi9 AgeKidwp+ekFxSmFdtp0aQP/4DUdD4PPJeX9iloQivT+IwE1W6grEQnUGKt5ls2sNy5X C/DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=M/cLgi+gVvY42hsxJLhypRISueb/0uWeejXC2wh0DqM=; b=FhNykQ5JBmNH2Z094sCywvFZrWjkcOBtSjdJmgCT2xZAqFI0aSXaVkMzTV4MpocMpm uNydOkJjIOlVGJmnEgw8kjSI6Q6otCdQs0MjVP+mlO/5CzqkgJ6sD6iK/fJ2tTe5pwCm qEaVN3MUQztrYVpb8ouWwK0n4nzvaGTWiiJvvEjetkEWAqQFyPxRNNh0sIeHfbbWZdiS RPiRPZHE8Zq6L0paI9ccINHSEIxr2NrPcWk8iA4G6xtIOjyZ8g53Uh4H2o5/1VrkANTJ WgTk1p9mNhipOx7e/psnVryIpmxBcdL0fN2zyyLzgAL998ighHgj2T86lT8M4fThYzWP imLQ== X-Gm-Message-State: AD7BkJJxsf8hOu4Hi5plGeb/alTWqOYn4ILZ4Y/SI5gDHT70G603xiWoASPIuIFyEU23Ky/zo7RGTHtSfVZSyQ== MIME-Version: 1.0 X-Received: by 10.31.129.11 with SMTP id c11mr28719882vkd.52.1457540457968; Wed, 09 Mar 2016 08:20:57 -0800 (PST) Received: by 10.31.128.213 with HTTP; Wed, 9 Mar 2016 08:20:57 -0800 (PST) In-Reply-To: References: <56DF6014.9090604@gmail.com> Date: Wed, 9 Mar 2016 16:20:57 +0000 Message-ID: Subject: Re: Hive Context: Hive Metastore Client From: Mich Talebzadeh To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11458bbe3e3c31052da01587 --001a11458bbe3e3c31052da01587 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks Alan for the info. I will have a look. Some tools like MongoDB (providing its own database) provide a layer of access by creating an admin database through which admin users are authenticated and new users can be added to the individual databases. When it comes to Hadoop and its ultimate storage system HDFS, it is clear that a common framework for security is needed. Having said that one can bypass anything by going directory to HDFS file system. We are also concerned that when data in ingested into Hive through temporary OS file storage, the data on the file system needs to be encrypted to stop exposing client data. Most RDBMS offer encrypted tables and columns and I presume if and when data ends up in Hive, they need to protected through encryption. I have not heard of any encrypted utility within Hive yet. Cheers, Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6= zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 9 March 2016 at 15:58, Alan Gates wrote: > One way people have gotten around the lack of LDAP connectivity in HS2 ha= s > been to use Apache Knox. That project=E2=80=99s goal is to provide a sin= gle login > capability for Hadoop related projects so that users can tie their LDAP o= r > Active Directory servers into Hadoop. > > Alan. > > > On Mar 8, 2016, at 16:00, Mich Talebzadeh > wrote: > > > > The current scenario resembles a three tier architecture but without th= e > security of second tier. In a typical three-tier you have users connectin= g > to the application server (read Hive server2) are independently > authenticated and if OK, the second tier creates new ,NET type or JDBC > threads to connect to database much like multi-threading. The problem I > believe is that Hive server 2 does not have that concept of handling the > individual loggings yet. Hive server 2 should be able to handle LDAP logi= ns > as well. It is a useful layer to have. > > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCd= OABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > > > > On 8 March 2016 at 23:28, Alex wrote= : > > Yes, when creating a Hive Context a Hive Metastore client should be > created with a user that the Spark application will talk to the *remote* > Hive Metastore with. We would like to add a custom authorization plugin t= o > our remote Hive Metastore to authorize the query requests that the spark > application is submitting which would also add authorization for any othe= r > applications hitting the Hive Metastore. Furthermore we would like to > extend this so that we can submit "jobs" to our Spark application that wi= ll > allow us to run against the metastore as different users while leveraging > the abilities of our spark cluster. But as you mentioned only one login > connects to the Hive Metastore is shared among all HiveContext sessions. > > > > Likely the authentication would have to be completed either through a > secured Hive Metastore (Kerberos) or by having the requests go through > HiveServer2. > > > > --Alex > > > > > > On 3/8/2016 3:13 PM, Mich Talebzadeh wrote: > >> Hi, > >> > >> What do you mean by Hive Metastore Client? Are you referring to Hive > server login much like beeline? > >> > >> Spark uses hive-site.xml to get the details of Hive metastore and the > login to the metastore which could be any database. Mine is Oracle and as > far as I know even in Hive 2, hive-site.xml has an entry for > javax.jdo.option.ConnectionUserName that specifies username to use agains= t > metastore database. These are all multi-threaded JDBC connections to the > database, the same login as shown below: > >> > >> LOGIN SID/serial# LOGGED IN S HOST OS PID Client PID > PROGRAM MEM/KB Logical I/O Physical I/O ACT > >> -------- ----------- ----------- ---------- -------------- > -------------- --------------- ------------ ---------------- ------------ > --- > >> INFO > >> ------- > >> HIVEUSER 67,6160 08/03 08:11 rhes564 oracle/20539 hduser/1234 > JDBC Thin Clien 1,017 37 0 N > >> HIVEUSER 89,6421 08/03 08:11 rhes564 oracle/20541 hduser/1234 > JDBC Thin Clien 1,081 528 0 N > >> HIVEUSER 112,561 08/03 10:45 rhes564 oracle/24624 hduser/1234 > JDBC Thin Clien 889 37 0 N > >> HIVEUSER 131,8811 08/03 08:11 rhes564 oracle/20543 hduser/1234 > JDBC Thin Clien 1,017 37 0 N > >> HIVEUSER 47,30114 08/03 10:45 rhes564 oracle/24626 hduser/1234 > JDBC Thin Clien 1,017 37 0 N > >> HIVEUSER 170,8955 08/03 08:11 rhes564 oracle/20545 hduser/1234 > JDBC Thin Clien 1,017 323 0 N > >> > >> As I understand what you are suggesting is that each Spark user uses > different login to connect to Hive metastore. As of now there is only one > login that connects to Hive metastore shared among all > >> > >> 2016-03-08T23:08:01,890 INFO [pool-5-thread-72]: HiveMetaStore.audit > (HiveMetaStore.java:logAuditEvent(280)) - ugi=3Dhduser > ip=3D50.140.197.217 cmd=3Dsource:50.140.197.217 get_table : db=3Dte= st tbl=3Dt > >> 2016-03-08T23:18:10,432 INFO [pool-5-thread-81]: HiveMetaStore.audit > (HiveMetaStore.java:logAuditEvent(280)) - ugi=3Dhduser > ip=3D50.140.197.216 cmd=3Dsource:50.140.197.216 get_tables: db=3Das= ehadoop > pat=3D.* > >> > >> And this is an entry in Hive log when connection is made theough > Zeppelin UI > >> > >> 2016-03-08T23:20:13,546 INFO [pool-5-thread-84]: > metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Openi= ng > raw store with implementation > class:org.apache.hadoop.hive.metastore.ObjectStore > >> 2016-03-08T23:20:13,547 INFO [pool-5-thread-84]: metastore.ObjectStor= e > (ObjectStore.java:initialize(318)) - ObjectStore, initialize called > >> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: > metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:(142)) - Usin= g > direct SQL, underlying DB is ORACLE > >> 2016-03-08T23:20:13,550 INFO [pool-5-thread-84]: metastore.ObjectStor= e > (ObjectStore.java:setConf(301)) - Initialized ObjectStore > >> > >> I am not sure there is currently such plan to have different logins > allowed to Hive Metastore. But it will add another level of security. > Though I am not sure how this would be authenticated. > >> > >> HTH > >> > >> > >> > >> Dr Mich Talebzadeh > >> > >> LinkedIn > https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCd= OABUrV8Pw > >> > >> http://talebzadehmich.wordpress.com > >> > >> > >> On 8 March 2016 at 22:23, Alex F > wrote: > >> As of Spark 1.6.0 it is now possible to create new Hive Context > sessions sharing various components but right now the Hive Metastore Clie= nt > is shared amongst each new Hive Context Session. > >> > >> Are there any plans to create individual Metastore Clients for each > Hive Context? > >> > >> Related to the question above are there any plans to create an > interface for customizing the username that the Metastore Client uses to > connect to the Hive Metastore? Right now it either uses the user specifie= d > in an environment variable or the application's process owner. > >> > > > > > > --001a11458bbe3e3c31052da01587 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks Alan for the info. I will have a look.

Some tools like MongoDB (providing its own database) prov= ide a layer of access by creating an admin database through which admin use= rs are authenticated and new users can be added to the individual databases= .

When it comes to Hadoop and its ultimate storage= system HDFS, it is clear that a common framework for security is needed. H= aving said that one can bypass anything=C2=A0 by going directory to HDFS fi= le system.

We are also concerned that when data in= ingested into Hive=C2=A0through temporary OS file storage, the data on the= file system=C2=A0needs to be encrypted=C2=A0to stop exposing client data. = Most RDBMS offer encrypted tables and columns and I presume if and when dat= a ends up in Hive, they need to protected through encryption. I have not he= ard of any encrypted utility within Hive yet.

Chee= rs,

Mich

On 9 March 2016 at 15:58, Alan Gates <al= anfgates@gmail.com> wrote:
= One way people have gotten around the lack of LDAP connectivity in HS2 has = been to use Apache Knox.=C2=A0 That project=E2=80=99s goal is to provide a = single login capability for Hadoop related projects so that users can tie t= heir LDAP or Active Directory servers into Hadoop.

Alan.

> On Mar 8, 2016, at 16:00, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
>
> The current scenario resembles a three tier architecture but without t= he security of second tier. In a typical three-tier you have users connecti= ng to the application server (read Hive server2) are independently authenti= cated and if OK, the second tier creates new ,NET type or JDBC threads to c= onnect to database much like multi-threading. The problem I believe is that= Hive server 2 does not have that concept of handling the individual loggin= gs yet. Hive server 2 should be able to handle LDAP logins as well. It is a= useful layer to have.
>
> Dr Mich Talebzadeh
>
> LinkedIn=C2=A0 https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCd= OABUrV8Pw
>
> http://talebzadehmich.wordpress.com
>
>
> On 8 March 2016 at 23:28, Alex <this.side.of.confusion@gmail.com> wrote:
> Yes, when creating a Hive Context a Hive Metastore client should be cr= eated with a user that the Spark application will talk to the *remote* Hive= Metastore with. We would like to add a custom authorization plugin to our = remote Hive Metastore to authorize the query requests that the spark applic= ation is submitting which would also add authorization for any other applic= ations hitting the Hive Metastore. Furthermore we would like to extend this= so that we can submit "jobs" to our Spark application that will = allow us to run against the metastore as different users while leveraging t= he abilities of our spark cluster. But as you mentioned only one login conn= ects to the Hive Metastore is shared among all HiveContext sessions.
>
> Likely the authentication would have to be completed either through a = secured Hive Metastore (Kerberos) or by having the requests go through Hive= Server2.
>
> --Alex
>
>
> On 3/8/2016 3:13 PM, Mich Talebzadeh wrote:
>> Hi,
>>
>> What do you mean by Hive Metastore Client? Are you referring to Hi= ve server login much like beeline?
>>
>> Spark uses hive-site.xml to get the details of Hive metastore and = the login to the metastore which could be any database. Mine is Oracle and = as far as I know even in=C2=A0 Hive 2, hive-site.xml has an entry for javax= .jdo.option.ConnectionUserName that specifies username to use against metas= tore database. These are all multi-threaded JDBC connections to the databas= e, the same login as shown below:
>>
>> LOGIN=C2=A0 =C2=A0 SID/serial# LOGGED IN S HOST=C2=A0 =C2=A0 =C2= =A0 =C2=A0OS PID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Client PID=C2=A0 =C2=A0 = =C2=A0PROGRAM=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0MEM/KB= =C2=A0 =C2=A0 =C2=A0 Logical I/O Physical I/O ACT
>> -------- ----------- ----------- ---------- -------------- -------= ------- --------------- ------------ ---------------- ------------ ---
>> INFO
>> -------
>> HIVEUSER 67,6160=C2=A0 =C2=A0 =C2=A008/03 08:11 rhes564=C2=A0 =C2= =A0 oracle/20539=C2=A0 =C2=A0hduser/1234=C2=A0 =C2=A0 JDBC Thin Clien=C2=A0= =C2=A0 =C2=A0 =C2=A0 1,017=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A037=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 N
>> HIVEUSER 89,6421=C2=A0 =C2=A0 =C2=A008/03 08:11 rhes564=C2=A0 =C2= =A0 oracle/20541=C2=A0 =C2=A0hduser/1234=C2=A0 =C2=A0 JDBC Thin Clien=C2=A0= =C2=A0 =C2=A0 =C2=A0 1,081=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= 528=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 N
>> HIVEUSER 112,561=C2=A0 =C2=A0 =C2=A008/03 10:45 rhes564=C2=A0 =C2= =A0 oracle/24624=C2=A0 =C2=A0hduser/1234=C2=A0 =C2=A0 JDBC Thin Clien=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 889=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A037=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 N
>> HIVEUSER 131,8811=C2=A0 =C2=A0 08/03 08:11 rhes564=C2=A0 =C2=A0 or= acle/20543=C2=A0 =C2=A0hduser/1234=C2=A0 =C2=A0 JDBC Thin Clien=C2=A0 =C2= =A0 =C2=A0 =C2=A0 1,017=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A037=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 N
>> HIVEUSER 47,30114=C2=A0 =C2=A0 08/03 10:45 rhes564=C2=A0 =C2=A0 or= acle/24626=C2=A0 =C2=A0hduser/1234=C2=A0 =C2=A0 JDBC Thin Clien=C2=A0 =C2= =A0 =C2=A0 =C2=A0 1,017=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A037=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 N
>> HIVEUSER 170,8955=C2=A0 =C2=A0 08/03 08:11 rhes564=C2=A0 =C2=A0 or= acle/20545=C2=A0 =C2=A0hduser/1234=C2=A0 =C2=A0 JDBC Thin Clien=C2=A0 =C2= =A0 =C2=A0 =C2=A0 1,017=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 323= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 N
>>
>> As I understand what you are suggesting is that each Spark user us= es different login to connect to Hive metastore. As of now there is only on= e login that connects to Hive metastore shared among all
>>
>> 2016-03-08T23:08:01,890 INFO=C2=A0 [pool-5-thread-72]: HiveMetaSto= re.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=3Dhduser=C2=A0 =C2= =A0 =C2=A0 ip=3D50.140.197.217=C2=A0 =C2=A0 =C2=A0 =C2=A0cmd=3Dsource:50.14= 0.197.217 get_table : db=3Dtest tbl=3Dt
>> 2016-03-08T23:18:10,432 INFO=C2=A0 [pool-5-thread-81]: HiveMetaSto= re.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=3Dhduser=C2=A0 =C2= =A0 =C2=A0 ip=3D50.140.197.216=C2=A0 =C2=A0 =C2=A0 =C2=A0cmd=3Dsource:50.14= 0.197.216 get_tables: db=3Dasehadoop pat=3D.*
>>
>> And this is an entry in Hive log when connection is made theough Z= eppelin UI
>>
>> 2016-03-08T23:20:13,546 INFO=C2=A0 [pool-5-thread-84]: metastore.H= iveMetaStore (HiveMetaStore.java:newRawStore(499)) - 84: Opening raw store = with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 2016-03-08T23:20:13,547 INFO=C2=A0 [pool-5-thread-84]: metastore.O= bjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize cal= led
>> 2016-03-08T23:20:13,550 INFO=C2=A0 [pool-5-thread-84]: metastore.M= etaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direc= t SQL, underlying DB is ORACLE
>> 2016-03-08T23:20:13,550 INFO=C2=A0 [pool-5-thread-84]: metastore.O= bjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
>>
>> I am not sure there is currently such plan to have different login= s allowed to Hive Metastore. But it will add another level of security. Tho= ugh I am not sure how this would be authenticated.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn=C2=A0 https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6Ac= PCCdOABUrV8Pw
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> On 8 March 2016 at 22:23, Alex F <this.side.of.confusion@gmail.com> wrote:
>> As of Spark 1.6.0 it is now possible to create new Hive Context se= ssions sharing various components but right now the Hive Metastore Client i= s shared amongst each new Hive Context Session.
>>
>> Are there any plans to create individual Metastore Clients for eac= h Hive Context?
>>
>> Related to the question above are there any plans to create an int= erface for customizing the username that the Metastore Client uses to conne= ct to the Hive Metastore? Right now it either uses the user specified in an= environment variable or the application's process owner.
>>
>
>


--001a11458bbe3e3c31052da01587--