hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: hive mapjoin decision process
Date Tue, 19 Jul 2011 22:57:53 GMT
thanks.
changing mapred.child.java.opts from -Xmx512m to -Xmx1024m did the trick


allocating more memory to the

On Tue, Jul 19, 2011 at 6:49 PM, yongqiang he <heyongqiangict@gmail.com>wrote:

> >> i thought only one table needed to be small?
> Yes.
>
> >> hive.mapjoin.maxsize also apply to big table?
> No.
>
> >> i made sure hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize
> are set large enough to accomodate the small table. yet hive does not
> attempt to do a mapjoin.
>
> There are physical limitations. If the local machine can not hold all
> records in memory locally, the local hashmap has to fail. So check
> your machine's memory or the memory allocated for hive.
>
> Thanks
> Yongqiang
> On Tue, Jul 19, 2011 at 1:55 PM, Koert Kuipers <koert@tresata.com> wrote:
> > thanks!
> > i only see hive create the hashmap dump and perform mapjoin if both
> tables
> > are small. i thought only one table needed to be small?
> >
> > i try to merge a very large table with a small table. i made sure
> > hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are set large
> > enough to accomodate the small table. yet hive does not attempt to do a
> > mapjoin. does hive.mapjoin.maxsize also apply to big table? or do i need
> to
> > look at other parameters as well?
> >
> > On Tue, Jul 19, 2011 at 4:15 PM, yongqiang he <heyongqiangict@gmail.com>
> > wrote:
> >>
> >> in most cases, the mapjoin falls back to normal join because of one of
> >> these three reasons:
> >> 1) the input table size is very big, so there will be no try on mapjoin
> >> 2) if one of the input table is small (let's say less than 25MB which
> >> is configurable), hive will try a local hashmap dump. If it cause OOM
> >> on the client side when doing the local hashmap dump, it will go back
> >> normal join.The reason here is mostly due to very good compression on
> >> the input data.
> >> 3) the mapjoin actually got started, and fails. it will fall back
> >> normal join. This will most unlikely happen
> >>
> >> Thanks
> >> Yongqiang
> >> On Tue, Jul 19, 2011 at 11:16 AM, Koert Kuipers <koert@tresata.com>
> wrote:
> >> > note: this is somewhat a repost of something i posted on the CDH3 user
> >> > group. apologies if that is not appropriate.
> >> >
> >> > i am exploring map-joins in hive. with hive.auto.convert.join=true
> hive
> >> > tries to do a map-join and then falls back on a mapreduce-join if
> >> > certain
> >> > conditions are not met. this sounds great. but when i do a
> >> > query and i notice it falls back on a mapreduce-join, how can i see
> >> > which
> >> > condition triggered the fallback (smalltablle.filesize or
> >> > mapjoin.maxsize or
> >> > something else perhaps memory related)?
> >> >
> >> > i tried reading the default log that a hive session produces, but it
> >> > seems
> >> > more like a massive json file than a log to me, so it is very hard for
> >> > me to
> >> > interpret that. i also turned on logging to console with debugging,
> >> > looking
> >> > for any clues there but without luck so far. is the info there and am
> i
> >> > just
> >> > overlooking it? any ideas?
> >> >
> >> > thanks! koert
> >> >
> >> >
> >> >
> >
> >
>

Mime
View raw message