hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehant Baid <baid.meh...@gmail.com>
Subject Re: Bug in map join optimization causing "OutOfMemory" error
Date Fri, 01 Nov 2013 07:23:05 GMT
Hey Folks,

Could you please take a look at the below problem. We are hitting 
OutOfMemoryErrors while joining tables that are not managed by Hive.

Would appreciate any feedback.

Thanks
Mehant
On 10/7/13 12:04 PM, Mehant Baid wrote:
> Hey Folks,
>
> We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The 
> problem seems to be in CommonJoinResolver.java (processCurrentTask()), 
> in this function we try and convert a map-reduce join to a map join if 
> 'n-1' tables involved in a 'n' way join have a size below a certain 
> threshold.
>
> If the tables are maintained by hive then we have accurate sizes of 
> each table and can apply this optimization but if the tables are 
> created using storage handlers, HBaseStorageHanlder in our case then 
> the size is set to be zero. Due to this we assume that we can apply 
> the optimization and convert the map-reduce join to a map join. So we 
> build a in-memory hash table for all the keys, since our table created 
> using the storage handler is large, it does not fit in memory and we 
> hit the error.
>
> Should I open a JIRA for this? One way to fix this is to set the size 
> of the table (created using storage handler) to be equal to the map 
> join threshold. This way the table would be selected as the big table 
> and we can proceed with the optimization if other tables in the join 
> have size below the threshold. If we have multiple big tables then the 
> optimization would be turned off.
>
> Thanks
> Mehant


Mime
View raw message