hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balaji Rao <sbalaji...@gmail.com>
Subject Re: HIVE and S3 via EMR?
Date Tue, 29 May 2012 21:27:02 GMT
the location should be 's3://' and not 's3n://'

On Tue, May 29, 2012 at 5:19 PM, Russell Jurney
<russell.jurney@gmail.com> wrote:
> Ok, I spoke too soon.  Same error.  Crapola.  Still working on it.
>
>
> On Tue, May 29, 2012 at 2:19 PM, Russell Jurney <russell.jurney@gmail.com>
> wrote:
>>
>> I get an error when I create an external table.  btw - I can partition on
>> dt or from/to address.  I'm just not clear on how to partition - my efforts
>> fail.
>>
>> hive> create external table from_to(from_address string, to_address
>> string, dt string)
>>     >     row format delimited fields terminated by '\t' stored as
>> textfile location 's3n://rjurney_public_web/from_to_date';
>> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
>> hostname in URI s3n://rjurney_public_web/from_to_date
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>> However, I just upgraded to HIVE 0.9, and it works :)  No reason to use
>> the old stuff when I can scp the new one up.
>>
>> Thanks!
>>
>> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <sbalajirao@gmail.com> wrote:
>>>
>>> If you are using hive on EMR, you can create a table directly from the
>>> data on S3:
>>>
>>> From hive, you can create tables that use S3 data like this:
>>>
>>> create external table from_to(from_address string, to_address string,
>>> dt string) row format delimited fields terminated by '\t' stored as
>>> textfile location 's3://rjurney_public_web/from_to_date';
>>>
>>> You could then:
>>>  select <*> from from_to
>>>
>>> Balaji
>>>
>>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
>>> <russell.jurney@gmail.com> wrote:
>>> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
>>> > small
>>> > cluster, and I want to load a 3-column TSV file from Pig into a table
>>> > like
>>> > this:
>>> >
>>> > create table from_to (from_address string, to_address string, dt
>>> > string);
>>> >
>>> >
>>> > When I run something like this:
>>> >
>>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
>>> > from_to;
>>> >
>>> >
>>> > I get errors:
>>> >
>>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
>>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
>>> > systems
>>> > accepted. s3n file system is not supported.
>>> >
>>> >
>>> > There is no distcp on the master node of my EMR cluster, so I can't
>>> > copy it
>>> > over.  I've read the documentation... and so far after a day of trying,
>>> > I
>>> > can't load data into HIVE via EMR.
>>> >
>>> > What am I missing?  Thanks!
>>> > --
>>> > Russell
>>> > Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
>>
>>
>>
>>
>> --
>> Russell
>> Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Mime
View raw message