hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Issue joining 21 HUGE Hive tables
Date Thu, 24 Mar 2016 06:36:49 GMT
Joining so many external tables is always an issue with any component. Your problem is not
Hive specific; but your data model seems to be messed up. First of all you should have them
in an appropriate format, such as ORC or parquet and the tables should not be external. Then
you should use the right data types for columns, eg an int instead of a varchar if you have
just numbers in a column. After that check if you can prejoin and store the data in one big
flat table and do queries on that.

Then you should look at the min / max indexes , bloom filters, statistics, partitions etc.

Maybe you can post more details about data model and queries. 

> On 24 Mar 2016, at 02:49, Sanka, Himabindu <> wrote:
> Hi Team,
> I need some inputs from you. I have a requirement for my project where I have to join
21 hive external tables.
> Out of which 6 tables are HUGE  having 500 million records of data. Other 15 tables are
smaller ones around 100 to 1000 records each.
> When I am doing inner joins/ left outer joins its taking hours to run the query.
> Please let me know some optimization techniques or any other eco system components that
performs better than HIVE.
> Regards,
> Hima
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.

View raw message