hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: hive mapjoin decision process
Date Tue, 19 Jul 2011 20:55:18 GMT
thanks!
i only see hive create the hashmap dump and perform mapjoin if both tables
are small. i thought only one table needed to be small?

i try to merge a very large table with a small table. i made sure
hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are set large
enough to accomodate the small table. yet hive does not attempt to do a
mapjoin. does hive.mapjoin.maxsize also apply to big table? or do i need to
look at other parameters as well?

On Tue, Jul 19, 2011 at 4:15 PM, yongqiang he <heyongqiangict@gmail.com>wrote:

> in most cases, the mapjoin falls back to normal join because of one of
> these three reasons:
> 1) the input table size is very big, so there will be no try on mapjoin
> 2) if one of the input table is small (let's say less than 25MB which
> is configurable), hive will try a local hashmap dump. If it cause OOM
> on the client side when doing the local hashmap dump, it will go back
> normal join.The reason here is mostly due to very good compression on
> the input data.
> 3) the mapjoin actually got started, and fails. it will fall back
> normal join. This will most unlikely happen
>
> Thanks
> Yongqiang
> On Tue, Jul 19, 2011 at 11:16 AM, Koert Kuipers <koert@tresata.com> wrote:
> > note: this is somewhat a repost of something i posted on the CDH3 user
> > group. apologies if that is not appropriate.
> >
> > i am exploring map-joins in hive. with hive.auto.convert.join=true hive
> > tries to do a map-join and then falls back on a mapreduce-join if certain
> > conditions are not met. this sounds great. but when i do a
> > query and i notice it falls back on a mapreduce-join, how can i see which
> > condition triggered the fallback (smalltablle.filesize or mapjoin.maxsize
> or
> > something else perhaps memory related)?
> >
> > i tried reading the default log that a hive session produces, but it
> seems
> > more like a massive json file than a log to me, so it is very hard for me
> to
> > interpret that. i also turned on logging to console with debugging,
> looking
> > for any clues there but without luck so far. is the info there and am i
> just
> > overlooking it? any ideas?
> >
> > thanks! koert
> >
> >
> >
>

Mime
View raw message