hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zsh...@gmail.com>
Subject Re: Issues with joining across large tables
Date Mon, 26 Oct 2009 06:08:07 GMT
It's probably caused by the Cartesian product of many rows from the two
tables with the same key.

Zheng

On Sun, Oct 25, 2009 at 7:22 PM, Ryan LeCompte <lecompte@gmail.com> wrote:

> It also looks like the reducers just never stop outputting things likethe
> (following  -- see below), causing them to ultimately time out and get
> killed by the system.
>
> 2009-10-25 22:21:18,879 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
100000000 rows
>
> 2009-10-25 22:21:22,009 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding
101000000 rows
> 2009-10-25 22:21:22,010 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
101000000 rows
> 2009-10-25 22:21:25,141 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding
102000000 rows
>
> 2009-10-25 22:21:25,142 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
102000000 rows
> 2009-10-25 22:21:28,263 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding
103000000 rows
> 2009-10-25 22:21:28,263 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
103000000 rows
>
> 2009-10-25 22:21:31,387 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding
104000000 rows
> 2009-10-25 22:21:31,387 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
104000000 rows
> 2009-10-25 22:21:34,510 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding
105000000 rows
>
> 2009-10-25 22:21:34,510 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
105000000 rows
> 2009-10-25 22:21:37,633 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding
106000000 rows
> 2009-10-25 22:21:37,633 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding
106000000 rows
>
>
>
> On Sun, Oct 25, 2009 at 9:39 PM, Ryan LeCompte <lecompte@gmail.com> wrote:
>
>> Hello all,
>>
>> Should I expect to be able to do a Hive JOIN between two tables that have
>> about 10 or 15GB of data each? What I'm noticing (for a simple JOIN) is that
>> all the map tasks complete, but the reducers just hang at around 87% or so
>> (for the first set of 4 reducers), and then they eventually just get killed
>> due to inability to respond by the cluster. I can do a JOIN between a large
>> table and a very small table of 10 or so records just fine.
>>
>> Any thoughts?
>>
>> Thanks,
>> Ryan
>>
>>
>


-- 
Yours,
Zheng

Mime
View raw message