hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Michael <>
Subject Join optimization for star schemas
Date Tue, 14 Jul 2009 04:44:31 GMT
In our hive instance, we have one large fact-type table that joins to several dimension tables
on integer keys.  I know from reading the Language Manual that in ordering joins it is best
to join the largest table last in the sequence in order to minimize memory usage.  This won't
work in the situation where you want to join the large fact table to more than one dimension.
 Something like:

select ... from small_table1 join big_table on ... join small_table2 on ...

I have to imagine this is a pretty common pattern, is there any guidance for doing this sort
of star schema join?

View raw message