hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <dean.wamp...@thinkbiganalytics.com>
Subject Re: Alter table is giving error
Date Tue, 27 Nov 2012 23:12:59 GMT
Right, your CREATE TABLE statement now points to your S3 location, so you
don't need to do anything else. However, queries will pull this data from
S3 every time, which will be a little slower and you'll incur a small
charge for reading from S3. However, parking data there is great when you
only need occasional access to it, not frequent access where using an HDFS
location is better.

However, as a side note, the message informs you that you can't use an S3
location in a LOAD DATA statement. So, if you ever define a
managed/internal table and want to populate it with S3 data, you'll have to
copy the data from S3 to your cluster first, then load it from there.

dean

On Tue, Nov 27, 2012 at 2:53 PM, Mark Grover <grover.markgrover@gmail.com>wrote:

> Chunky,
> You have an external table that points at the location s3://location/
>
> No need to load the data. All files (or partitions folders) under
> s3://location/ should be available via the table.
> Just run your queries on it.
>
> Load data will move the data from one HDFS location to another. You don't
> need/want to do that in this case.
>
> Mark
>
> On Tue, Nov 27, 2012 at 12:18 PM, Chunky Gupta <chunky.gupta@vizury.com>wrote:
>
>> Hi,
>>
>> Now when I am trying to load a csv file to any table I created, its not
>> working.
>>
>> I created a table :-
>> CREATE EXTERNAL TABLE someidtable (
>> someid STRING,
>> )
>> ROW FORMAT
>> DELIMITED FIELDS TERMINATED BY '\t'
>> LINES TERMINATED BY '\n'
>> LOCATION 's3://location/';
>>
>> Then
>>
>> LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable;
>>
>> It gives this error:-
>> "Error in semantic analysis: Line 1:17 Invalid path
>> ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems
>> accepted"
>>
>> Please help me in resolving this issue.
>> Thanks,
>> Chunky.
>>
>>
>> On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <chunky.gupta@vizury.com>wrote:
>>
>>> Okay Mark, I will be looking into this JIRA regularly.
>>> Thanks again for helping.
>>> Chunky.
>>>
>>>
>>> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <
>>> grover.markgrover@gmail.com> wrote:
>>>
>>>> Chunky,
>>>> I just tried it myself. It turns out that the directory you are adding
>>>> as partition has to be empty for msck repair to work. This is obviously
>>>> sub-optimal and there is a JIRA in place (
>>>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>>>>
>>>> So, I'd suggest you keep an eye out for the next version for that fix
>>>> to come in. In the meanwhile, run msck after you create your partition
>>>> directory but before you populate your directory with data.
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <chunky.gupta@vizury.com>wrote:
>>>>
>>>>> Hi Mark,
>>>>> Sorry, I forgot to mention. I have also tried
>>>>>                 msck repair table <Table name>;
>>>>> and same output I got which I got from msck only.
>>>>> Do I need to do any other settings for this to work, because I have
>>>>> prepared Hadoop and Hive setup from start on EC2.
>>>>>
>>>>> Thanks,
>>>>> Chunky.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <
>>>>> grover.markgrover@gmail.com> wrote:
>>>>>
>>>>>> Chunky,
>>>>>> You should have run:
>>>>>> msck repair table <Table name>;
>>>>>>
>>>>>> Sorry, I should have made it clear in my last reply. I have added
an
>>>>>> entry to Hive wiki for benefit of others:
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <chunky.gupta@vizury.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>> I didn't get any error.
>>>>>>> I ran this on hive console:-
>>>>>>>          "msck table Table_Name;"
>>>>>>> It says Ok and showed the execution time as 1.050 sec.
>>>>>>> But when I checked partitions for table using
>>>>>>>           "show partitions Table_Name;"
>>>>>>> It didn't show me any partitions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Chunky.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>>
>>>>>>>> Glad to hear, Chunky.
>>>>>>>>
>>>>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <
>>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Mark,
>>>>>>>>> I tried msck, but it is not working for me. I have written
a
>>>>>>>>> python script to partition the data individually.
>>>>>>>>>
>>>>>>>>> Thank you Edward, Mark and Dean.
>>>>>>>>> Chunky.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Chunky,
>>>>>>>>>> I have used "recover partitions" command on EMR,
and that worked
>>>>>>>>>> fine.
>>>>>>>>>>
>>>>>>>>>> However, take a look at
>>>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems
like msck
>>>>>>>>>> command in Apache Hive does the same thing. Try it
out and let us know it
>>>>>>>>>> goes.
>>>>>>>>>>
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>>>>> edlinuxguru@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Recover partitions should work the same way for
different file
>>>>>>>>>>> systems.
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>>>>> <dean.wampler@thinkbiganalytics.com> wrote:
>>>>>>>>>>> > Writing a script to add the external partitions
individually
>>>>>>>>>>> is the only way
>>>>>>>>>>> > I know of.
>>>>>>>>>>> >
>>>>>>>>>>> > Sent from my rotary phone.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta
<
>>>>>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi Dean,
>>>>>>>>>>> >
>>>>>>>>>>> > Actually I was having Hadoop and Hive cluster
on EMR and I
>>>>>>>>>>> have S3 storage
>>>>>>>>>>> > containing logs which updates daily and
having partition with
>>>>>>>>>>> date(dt). And
>>>>>>>>>>> > I was using this recover partition.
>>>>>>>>>>> > Now I wanted to shift to EC2 and have my
own Hadoop and Hive
>>>>>>>>>>> cluster. So,
>>>>>>>>>>> > what is the alternate of using recover partition
in this case,
>>>>>>>>>>> if you have
>>>>>>>>>>> > any idea ?
>>>>>>>>>>> > I found one way of individually partitioning
all dates, so I
>>>>>>>>>>> have to write
>>>>>>>>>>> > script for that to do so for all dates.
Is there any easiest
>>>>>>>>>>> way other than
>>>>>>>>>>> > this ?
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> > Chunky
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>>>>> > <dean.wampler@thinkbiganalytics.com>
wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement
added by Amazon to
>>>>>>>>>>> their version
>>>>>>>>>>> >> of Hive.
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>>>>> >>
>>>>>>>>>>> >> <shameless-plus>
>>>>>>>>>>> >>   Chapter 21 of Programming Hive discusses
this feature and
>>>>>>>>>>> other aspects
>>>>>>>>>>> >> of using Hive in EMR.
>>>>>>>>>>> >> </shameless-plug>
>>>>>>>>>>> >>
>>>>>>>>>>> >> dean
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky
Gupta <
>>>>>>>>>>> chunky.gupta@vizury.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Hi,
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I am having a cluster setup on EC2
with Hadoop version
>>>>>>>>>>> 0.20.2 and Hive
>>>>>>>>>>> >>> version 0.8.1 (I configured everything)
. I have created a
>>>>>>>>>>> table using :-
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY
)PARTITIONED BY ( ZZZ )ROW
>>>>>>>>>>> FORMAT
>>>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW'
LOCATION
>>>>>>>>>>> 's3://my-location/data/';
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Now I am trying to recover partition
using :-
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> but I am getting this error :- "FAILED:
Parse Error: line
>>>>>>>>>>> 1:12 cannot
>>>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER'
'PARTITIONS' in alter
>>>>>>>>>>> table statement"
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Doing same steps on a cluster setup
on EMR with Hadoop
>>>>>>>>>>> version 1.0.3 and
>>>>>>>>>>> >>> Hive version 0.8.1 (Configured by
EMR), works fine.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> So is this a version issue or am
I missing some
>>>>>>>>>>> configuration changes in
>>>>>>>>>>> >>> EC2 setup ?
>>>>>>>>>>> >>> I am not able to find exact solution
for this problem on
>>>>>>>>>>> internet. Please
>>>>>>>>>>> >>> help me.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Thanks,
>>>>>>>>>>> >>> Chunky.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>>>>> >> thinkbiganalytics.com
>>>>>>>>>>> >> +1-312-339-1330
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Mime
View raw message