hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Figueiredo <p...@89clouds.com>
Subject Re: Hive on EMR on S3 : Beginner
Date Sat, 25 Aug 2012 08:30:40 GMT
Hi,

On 25 Aug 2012, at 05:58, Ravi Shetye <ravi.shetye@vizury.com> wrote:

> Thanks Richin and Pedro,
> So a final clarification
>     Another way of doing apart from dynamic partition is if you can create your directories
like below either manually or the ETL process you might be doing to get the table data it
    is pretty easy.
> 
> 	s3://ravi/logs/adv_id=123/date=2012-01-01/log.gz
> 	s3://ravi/logs/adv_id=456/date=2012-01-02/log.gz
> 	s3://ravi/logs/adv_id=123/date=2012-01-03/log.gz
> 
> 1)Since I have used PARTITIONED BY (adv_id STRING,date STRING) Hive system will read
the bucket name adv_id=123 and understand that the data within this bucket can be accessed
by a pseudo column adv_id?

Yes.

> 2) It would be wrong if I use PARTITIONED BY (date STRING,adv_id STRING) and keep the
same bucket structure?

Yes, the order of the fields in PARTITIONED BY must match the structure.

> 3)Also it wont work if I store data in s3://ravi/logs/123/2012-01-01/log.gz ?

No, you need xxx=.

Cheers,

Pedro
Mime
View raw message