hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath" <prade...@yahoo-inc.com>
Subject RE: Thrift metastore server and dfs file owner
Date Tue, 20 Jul 2010 17:10:22 GMT
In addition to the options below, if there is some way to have custom
code into thrift clients then that could be a third option - from what
little I know of thrift, I think the client code is generated and there
is no way to add additional logic into the methods - but in case there
is a way to do that, then that might be the best option.

 

________________________________

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 
Sent: Monday, July 19, 2010 1:09 PM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

I agree this will be an issue for direct thrift clients. How about the
following options:

 

1) Add a conf variable - "strict.owner.mode" - if this is set to true on
the server, dirs will not be created and they will be created on the
client (both client and server should have the same value (true or
false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean
arg whether or not to create dirs. The HiveMetaStoreClient code will use
this new api with a "false" argument value and create the dir on the
client side. The issue with this is that existing Thrift client would be
calling the current API method which would create dirs as the thrift
server users. So depending on whether you are creating the table using
thrift (with old method) or CLI you get different results. The old
method could be deprecated and the thrift clients can migrate to the new
one.

 

Thoughts?

 

(This directory creation/deletion is relevant to create table/drop
table/add partition/alter table/alter partition I think)

 

Pradeep 

 

-----Original Message-----
From: Paul Yang [mailto:pyang@facebook.com] 
Sent: Monday, July 19, 2010 10:53 AM
To: hive-user@hadoop.apache.org
Subject: RE: Thrift metastore server and dfs file owner

 

That approach would work for the CLI, but then the semantics for the
create table/create partition calls for thrift clients would be
different - it would no longer create the table directory. This might be
a problem if there are scripts that rely on this property for
copying/moving files. Also, table renaming code would need to be
modified as well.

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Monday, July 19, 2010 10:24 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

I was thinking about this a little more and was wondering if the
following alternative approach is feasible:

Instead of the Metastore code creating the directories why not have
HiveMetastoreClient create it in createTable() after the table is
created - i.e. it can do a getTable().getSd().getLocation() and perform
wh.mkdirs() on that path. We could do the same thing with
addPartition().

 

This way, we can have the metastore thrift server running as a
non-hdfs-superuser. Also, we no longer need to keep track or user/group
information since the client already is running with the right
user/group credentials.

 

Thoughts?

 

Pradeep

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Thursday, July 15, 2010 10:23 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Currently group information is not present in the Table and both owner
and group information are absent from Database. If these are added to
these classes, we could change Warehouse.mkdirs(). This method is also
called form addPartition(), should we just use the table's owner/group
in this case? - could potentially fail in non thrift case if some other
user is creating the partitions OR we would need to add owner/group to
Partition as well with the implication that table and partition owner's
could differ causing query failures.

 

Paul's concern about security is valid but is there any other way around
this?

 

Pradeep

 

-----Original Message-----

From: Paul Yang [mailto:pyang@facebook.com] 

Sent: Wednesday, July 14, 2010 3:18 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

Yeah, you could overload Warehouse.mkdirs() to allow specification of an
owner/group and then use Filesystem.setOwner() within the method.

 

If the thrift server has full permissions for DFS though, wouldn't this
present a security hole? 

 

-----Original Message-----

From: Ashish Thusoo [mailto:athusoo@facebook.com] 

Sent: Wednesday, July 14, 2010 12:34 PM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

We could just fix this in Warehouse.java so that the mkdirs call make
the directories according to the owner field that is passed to the
table? That probably would be a simple fix for this, no?

 

Ashish

 

-----Original Message-----

From: Pradeep Kamath [mailto:pradeepk@yahoo-inc.com] 

Sent: Wednesday, July 14, 2010 11:14 AM

To: hive-user@hadoop.apache.org

Subject: RE: Thrift metastore server and dfs file owner

 

<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>

 

You mentioned: "I think the thrift server can use the dfs processor." -
were you suggesting the metastore implementation in HiveMetastore should
always do chown user:user on create_table_core() (or selectively look at
the conf and known it is being run as a thrift server and chown only in
that case)?

 

Pradeep

 

-----Original Message-----

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

Sent: Tuesday, July 13, 2010 4:52 PM

To: hive-user@hadoop.apache.org

Subject: Re: Thrift metastore server and dfs file owner

 

On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <pradeepk@yahoo-inc.com>
wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

> 

> My warehouse table dir still got created by "root" (the user my thrift


> server is running as) drwxr-xr-x   - root supergroup          0 

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

> 

> -----Original Message-----

> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: hive-user@hadoop.apache.org

> Subject: Re: Thrift metastore server and dfs file owner

> 

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath
<pradeepk@yahoo-inc.com> wrote:

>> Hi,

>> 

>>    I suspect this is true but wanted to confirm: If I start a thrift 

>> metastore service as user "joe" then all internal tables created will


>> have directories under the warehouse directory owned by "joe" 

>> regardless of the actual user running the create table statement - is


>> this correct? There is no way for the thrift server to create the
directory as the actual user?

>> However if thrift service is not used and the hive client directly 

>> works against the metastore database, then the directories are 

>> created by the actual user - is this correct?

>> 

>> 

>> 

>> Thanks,

>> 

>> Pradeep

> 

> The hive web interface does this:

> 

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

> 

> You should be able to accomplish the same thing using set commands 

> with the Thrift Server to impersonate.

> 

> Regards,

> Edward

> 

 

You are right. That technique may only affect files created during the
map/reduce job. I think the thrift server can use the dfs processor.

 

hive> dfs -chown user:user /user/hive/warehouse/foo2;

 

Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?

 

If you are enforcing permissions only the hadoop superuser (hadoop) will
be able to chown files to other users and groups.


Mime
View raw message