hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Giri <sand...@cloudxlab.com>
Subject Re: Why does the user need write permission on the location of external hive table?
Date Mon, 06 Jun 2016 19:16:29 GMT
Hi Mich,

Thank you for your response.

My question is very simple. How to do you process huge read-only data in
HDFS using Hive?


Regards,
Sandeep Giri,
+1 347 781 4573 (US)
+91-953-899-8962 (IN)

www.CloudxLab.com
Phone: +1 (412) 568-3901 <+1+(412)+568-3901> (Office)

[image: linkedin icon] <https://www.linkedin.com/company/cloudxlab> [image:
other site icon] <http://cloudxlab.com/> [image: facebook icon]
<https://www.facebook.com/cloudxlab/> [image: twitter icon]
<https://twitter.com/cloudxlab>


On Mon, Jun 6, 2016 at 10:14 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Well Sandeep, the permissioning on HDFS resembles that of Linux file
> system.
>
> For security reason it does not allow you to write to that file. An
> external table in Hive is just an interface.
>
> Any reason why you have not got access to that file. Can you try to log in
> with beeline with username and password?
>
> The data is immutable What is the use case for this table? Are you going
> to use data later in app/Hive and if so do you have permission to read it.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 6 June 2016 at 16:59, Sandeep Giri <sandeep@cloudxlab.com> wrote:
>
>> Yes, Mich that's right. That folder us read-only to me.
>>
>> That's my question. Why do we need modification permissions on the
>> location
>> while creating external table.
>>
>> This data is read-only. In hive, how can we process the huge data on which
>> we don't have write permissions? Is cloning this data the only
>> possibility?
>> On May 31, 2016 3:15 PM, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
>> wrote:
>>
>>> right that directly belongs to hdfs:hdfs and nonone else bar that user
>>> can write to it.
>>>
>>> if you are connecting via beeline you need to specify the user and
>>> password
>>>
>>> beeline -u jdbc:hive2://rhes564:10010/default
>>> org.apache.hive.jdbc.HiveDriver -n hduser -p xxxx
>>>
>>> When I look at permissioning I see only hdfs can write to it not user
>>> Sandeep?
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 31 May 2016 at 09:20, Sandeep Giri <sandeep@cloudxlab.com> wrote:
>>>
>>>> Yes, when I run hadoop fs it gives results correctly.
>>>>
>>>> *hadoop fs -ls
>>>> /data/SentimentFiles/SentimentFiles/upload/data/tweets_raw/*
>>>> *Found 30 items*
>>>> *-rw-r--r--   3 hdfs hdfs       6148 2015-12-04 15:19
>>>> /data/SentimentFiles/SentimentFiles/upload/data/tweets_raw/.DS_Store*
>>>> *-rw-r--r--   3 hdfs hdfs     803323 2015-12-04 15:19
>>>> /data/SentimentFiles/SentimentFiles/upload/data/tweets_raw/FlumeData.1367523670393.gz*
>>>> *-rw-r--r--   3 hdfs hdfs     284355 2015-12-04 15:19
>>>> /data/SentimentFiles/SentimentFiles/upload/data/tweets_raw/FlumeData.1367523670394.gz*
>>>> *....*
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, May 31, 2016 at 1:42 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> is this location correct and valid?
>>>>>
>>>>> LOCATION '/data/SentimentFiles/*SentimentFiles*/upload/data/
>>>>> tweets_raw/'
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 31 May 2016 at 08:50, Sandeep Giri <sandeep@cloudxlab.com> wrote:
>>>>>
>>>>>> Hi Hive Team,
>>>>>>
>>>>>> As per my understanding, in Hive, you can create two kinds of tables:
>>>>>> Managed and External.
>>>>>>
>>>>>> In case of managed table, you own the data and hence when you drop
>>>>>> the table the data is deleted.
>>>>>>
>>>>>> In case of external table, you don't have ownership of the data and
>>>>>> hence when you delete such a table, the underlying data is not deleted.
>>>>>> Only metadata is deleted.
>>>>>>
>>>>>> Now, recently i have observed that you can not create an external
>>>>>> table over a location on which you don't have write (modification)
>>>>>> permissions in HDFS. I completely fail to understand this.
>>>>>>
>>>>>> Use case: It is quite common that the data you are churning is huge
>>>>>> and read-only. So, to churn such data via Hive, will you have to
copy this
>>>>>> huge data to a location on which you have write permissions?
>>>>>>
>>>>>> Please help.
>>>>>>
>>>>>> My data is located in a hdfs folder
>>>>>> (/data/SentimentFiles/SentimentFiles/upload/data/tweets_raw/)  on
which I
>>>>>> only have readonly permission. And I am trying to execute the following
>>>>>> command
>>>>>>
>>>>>> *CREATE EXTERNAL TABLE tweets_raw (*
>>>>>> *        id BIGINT,*
>>>>>> *        created_at STRING,*
>>>>>> *        source STRING,*
>>>>>> *        favorited BOOLEAN,*
>>>>>> *        retweet_count INT,*
>>>>>> *        retweeted_status STRUCT<*
>>>>>> *        text:STRING,*
>>>>>> *        users:STRUCT<screen_name:STRING,name:STRING>>,*
>>>>>> *        entities STRUCT<*
>>>>>> *        urls:ARRAY<STRUCT<expanded_url:STRING>>,*
>>>>>> *        user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,*
>>>>>> *        hashtags:ARRAY<STRUCT<text:STRING>>>,*
>>>>>> *        text STRING,*
>>>>>> *        user1 STRUCT<*
>>>>>> *        screen_name:STRING,*
>>>>>> *        name:STRING,*
>>>>>> *        friends_count:INT,*
>>>>>> *        followers_count:INT,*
>>>>>> *        statuses_count:INT,*
>>>>>> *        verified:BOOLEAN,*
>>>>>> *        utc_offset:STRING, -- was INT but nulls are strings*
>>>>>> *        time_zone:STRING>,*
>>>>>> *        in_reply_to_screen_name STRING,*
>>>>>> *        year int,*
>>>>>> *        month int,*
>>>>>> *        day int,*
>>>>>> *        hour int*
>>>>>> *        )*
>>>>>> *        ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'*
>>>>>> *        WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")*
>>>>>> *        LOCATION
>>>>>> '/data/SentimentFiles/SentimentFiles/upload/data/tweets_raw/'*
>>>>>> *        ;*
>>>>>>
>>>>>> It throws the following error:
>>>>>>
>>>>>> FAILED: Execution Error, return code 1 from
>>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.
>>>>>> MetaException(message:java.security.AccessControlException: Permission
>>>>>> denied: user=sandeep, access=WRITE,
>>>>>> inode="/data/SentimentFiles/SentimentFiles/upload/data/tweets_raw":hdfs:hdfs:drwxr-xr-x
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1729)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:8348)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:1978)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.ja
>>>>>> va:1443)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProto
>>>>>> s.java)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>         at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>>>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Sandeep Giri,
>>>>>> +1-(347) 781-4573 (US)
>>>>>> +91-953-899-8962 (IN)
>>>>>> www.CloudxLab.com  (A Hadoop cluster for practicing)
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Sandeep Giri,
>>>> +1-(347) 781-4573 (US)
>>>> +91-953-899-8962 (IN)
>>>> www.CloudxLab.com
>>>>
>>>
>>>
>

Mime
View raw message