hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <richin.j...@nokia.com>
Subject Performance Issues in Hive with S3 and Partitions
Date Tue, 24 Jul 2012 03:33:32 GMT
Hi,

Sorry this is an AWS Hive Specific question.  I have two External Hive tables for my custom
logs.

1. flat directory structure on AWS S3, no partition and files in bz2 compressed format (few
big files)

2. With 3 level of partitions on AWS S3 (lot of small uncompressed files)

I noticed that my queries on the table with Partition is taking forever to run. The same queries
run fine and finish up quickly on table with no partition.
Am I missing something, I suspect this has something to do with the way S3 behaves.

A query example is :

select id, (max(unix_timestamp(ts, "MM/dd/yyyy HH:mm")) - min(unix_timestamp(ts, "MM/dd/yyyy
HH:mm")))/(60*60)
from logs
group by id;

Thanks,
Richin

Mime
View raw message