hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Putting the big table rightmost in the join
Date Fri, 19 Feb 2010 19:25:00 GMT
On Fri, Feb 19, 2010 at 12:35 AM, Yongqiang He
<heyongqiang@software.ict.ac.cn> wrote:
> Hi Edward,
> You can do it with streamtable hint. Hive will put the table in that hint in
> the rightmost.
> -yongqiang
> On 2/18/10 3:21 PM, "Edward Capriolo" <edlinuxguru@gmail.com> wrote:
>> I have worked through this issue.
>> * When doing Join, please put the table with big number of rows
>> containing the same join key to
>> the rightmost in the JOIN clause. Otherwise we may see OutOfMemory errors.
>> This advice does work, but should we open up a jira to create a simple
>> optimizer that does this?
>> Edward

I do not understand the hint. A user can re-write the query can't they?

select a join b
select b join a

What I am asking, should we add an optimizer that uses does heuristics
on the tables and automatically streams the smaller/larger?

View raw message