hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balaji Rao <sbalaji...@gmail.com>
Subject Re: HIVE and S3 via EMR?
Date Tue, 29 May 2012 21:35:31 GMT
To partition on s3, one would create folders like:
s3://mybucket/path/dt=2012-05-20
                             dt=2012-05-21
                             dt=2012-05-22

You can then use:
create external table from_to(from_address string, to_address string)
partitioned by (dt string) row format delimited fields terminated by
'\t' stored as textfile locaton 's3://mybucket/path';

Then issue the command:
alter table from_to recover partitions;

You will be able to then use the partitions:
select from_address, to_address, dt from from_to where dt >='2012-05-21'

On Tue, May 29, 2012 at 5:19 PM, Russell Jurney
<russell.jurney@gmail.com> wrote:
> I get an error when I create an external table.  btw - I can partition on dt
> or from/to address.  I'm just not clear on how to partition - my efforts
> fail.
>
> hive> create external table from_to(from_address string, to_address string,
> dt string)
>     >     row format delimited fields terminated by '\t' stored as textfile
> location 's3n://rjurney_public_web/from_to_date';
> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid
> hostname in URI s3n://rjurney_public_web/from_to_date
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
> However, I just upgraded to HIVE 0.9, and it works :)  No reason to use the
> old stuff when I can scp the new one up.
>
> Thanks!
>
> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <sbalajirao@gmail.com> wrote:
>>
>> If you are using hive on EMR, you can create a table directly from the
>> data on S3:
>>
>> From hive, you can create tables that use S3 data like this:
>>
>> create external table from_to(from_address string, to_address string,
>> dt string) row format delimited fields terminated by '\t' stored as
>> textfile location 's3://rjurney_public_web/from_to_date';
>>
>> You could then:
>>  select <*> from from_to
>>
>> Balaji
>>
>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney
>> <russell.jurney@gmail.com> wrote:
>> > How do I load data from S3 into Hive using Amazon EMR?  I've booted a
>> > small
>> > cluster, and I want to load a 3-column TSV file from Pig into a table
>> > like
>> > this:
>> >
>> > create table from_to (from_address string, to_address string, dt
>> > string);
>> >
>> >
>> > When I run something like this:
>> >
>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table
>> > from_to;
>> >
>> >
>> > I get errors:
>> >
>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path
>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file
>> > systems
>> > accepted. s3n file system is not supported.
>> >
>> >
>> > There is no distcp on the master node of my EMR cluster, so I can't copy
>> > it
>> > over.  I've read the documentation... and so far after a day of trying,
>> > I
>> > can't load data into HIVE via EMR.
>> >
>> > What am I missing?  Thanks!
>> > --
>> > Russell
>> > Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Mime
View raw message