hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <jsi...@facebook.com>
Subject Re: Mapjoin parameters?
Date Thu, 19 Aug 2010 20:56:02 GMT
For hive.mapjoin.cache.numrows, I found this in hive/conf/hive-default.xml:

<property>
  <name>hive.mapjoin.cache.numrows</name>
  <value>25000</value>
  <description>How many rows should be cached by jdbm for map join. </description>
</property>

hive.mapjoin.size is missing from hive-default.xml; can you create a JIRA issue for that?

JVS

On Aug 19, 2010, at 1:07 AM, Ted Xu wrote:

Hi all,

I found 2 parameters which have something to do with mapjoin, that is :

hive.mapjoin.cache.numrows
hive.mapjoin.size.key

I can't find any formal document on that 2 parameters.

I guess "hive.mapjoin.cache.numrows" sets the maximum row count of the small table in map
join, and rows more than that setting will be disposed. Once I use map join with a 50000+
rows table, some records can't be joined, and I fixed the problem by increasing "hive.mapjoin.cache.numrows".

However, sometimes I still get OOM exception even if the "hive.mapjoin.cache.numrows" parameter
is not set (by default, 25000 I guess).

Please explain me the usage of the parameters if you know, thanks.

--
Best Regards,
Ted Xu

Mime
View raw message