Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAJBK50Dpe2Yq8NAT0aYYv0Zcxa-6n_SWv9Nq1KBYBhJaQakaOA@mail.gmail.com>
References: 
 <CANx3uAiA+JiC-2Dnsq7R6tT2U9C8=NwQ9KNPUnBb82XL3yV-KA@mail.gmail.com>
	<CAJBK50BKTDTpsNy6gC2dqm73tuzaueVoPmeaMxUHGYqg4W_nfQ@mail.gmail.com>
	<CANx3uAgdE0jG06z2=rOpKDdd8JqY55fQ+eq5Dodm8nTF_NNR4w@mail.gmail.com>
	<CAJBK50Dpe2Yq8NAT0aYYv0Zcxa-6n_SWv9Nq1KBYBhJaQakaOA@mail.gmail.com>
Date: Tue, 19 Jul 2011 18:57:53 -0400
Message-ID: 
 <CANx3uAjQcfTJ8iFMO5L8iePwe+ihAzw6swJeGzbM4TmGAKs8Eg@mail.gmail.com>
Subject: Re: hive mapjoin decision process
From: Koert Kuipers <koert@tresata.com>
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=bcaec52c5c73b9823b04a8740b64

--bcaec52c5c73b9823b04a8740b64
Content-Type: text/plain; charset=ISO-8859-1

thanks.
changing mapred.child.java.opts from -Xmx512m to -Xmx1024m did the trick


allocating more memory to the

On Tue, Jul 19, 2011 at 6:49 PM, yongqiang he <heyongqiangict@gmail.com>wrote:

> >> i thought only one table needed to be small?
> Yes.
>
> >> hive.mapjoin.maxsize also apply to big table?
> No.
>
> >> i made sure hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize
> are set large enough to accomodate the small table. yet hive does not
> attempt to do a mapjoin.
>
> There are physical limitations. If the local machine can not hold all
> records in memory locally, the local hashmap has to fail. So check
> your machine's memory or the memory allocated for hive.
>
> Thanks
> Yongqiang
> On Tue, Jul 19, 2011 at 1:55 PM, Koert Kuipers <koert@tresata.com> wrote:
> > thanks!
> > i only see hive create the hashmap dump and perform mapjoin if both
> tables
> > are small. i thought only one table needed to be small?
> >
> > i try to merge a very large table with a small table. i made sure
> > hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are set large
> > enough to accomodate the small table. yet hive does not attempt to do a
> > mapjoin. does hive.mapjoin.maxsize also apply to big table? or do i need
> to
> > look at other parameters as well?
> >
> > On Tue, Jul 19, 2011 at 4:15 PM, yongqiang he <heyongqiangict@gmail.com>
> > wrote:
> >>
> >> in most cases, the mapjoin falls back to normal join because of one of
> >> these three reasons:
> >> 1) the input table size is very big, so there will be no try on mapjoin
> >> 2) if one of the input table is small (let's say less than 25MB which
> >> is configurable), hive will try a local hashmap dump. If it cause OOM
> >> on the client side when doing the local hashmap dump, it will go back
> >> normal join.The reason here is mostly due to very good compression on
> >> the input data.
> >> 3) the mapjoin actually got started, and fails. it will fall back
> >> normal join. This will most unlikely happen
> >>
> >> Thanks
> >> Yongqiang
> >> On Tue, Jul 19, 2011 at 11:16 AM, Koert Kuipers <koert@tresata.com>
> wrote:
> >> > note: this is somewhat a repost of something i posted on the CDH3 user
> >> > group. apologies if that is not appropriate.
> >> >
> >> > i am exploring map-joins in hive. with hive.auto.convert.join=true
> hive
> >> > tries to do a map-join and then falls back on a mapreduce-join if
> >> > certain
> >> > conditions are not met. this sounds great. but when i do a
> >> > query and i notice it falls back on a mapreduce-join, how can i see
> >> > which
> >> > condition triggered the fallback (smalltablle.filesize or
> >> > mapjoin.maxsize or
> >> > something else perhaps memory related)?
> >> >
> >> > i tried reading the default log that a hive session produces, but it
> >> > seems
> >> > more like a massive json file than a log to me, so it is very hard for
> >> > me to
> >> > interpret that. i also turned on logging to console with debugging,
> >> > looking
> >> > for any clues there but without luck so far. is the info there and am
> i
> >> > just
> >> > overlooking it? any ideas?
> >> >
> >> > thanks! koert
> >> >
> >> >
> >> >
> >
> >
>

--bcaec52c5c73b9823b04a8740b64
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

thanks.<br>changing mapred.child.java.opts from -Xmx512m to -Xmx1024m did t=
he trick<br><br><br>allocating more memory to the <br><br><div class=3D"gma=
il_quote">On Tue, Jul 19, 2011 at 6:49 PM, yongqiang he <span dir=3D"ltr">&=
lt;<a href=3D"mailto:heyongqiangict@gmail.com">heyongqiangict@gmail.com</a>=
&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"im"=
>&gt;&gt; i thought only one table needed to be small?<br>
</div>Yes.<br>
<div class=3D"im"><br>
&gt;&gt; hive.mapjoin.maxsize also apply to big table?<br>
</div>No.<br>
<div class=3D"im"><br>
&gt;&gt; i made sure hive.mapjoin.smalltable.filesize and hive.mapjoin.maxs=
ize are set large enough to accomodate the small table. yet hive does not a=
ttempt to do a mapjoin.<br>
<br>
</div>There are physical limitations. If the local machine can not hold all=
<br>
records in memory locally, the local hashmap has to fail. So check<br>
your machine&#39;s memory or the memory allocated for hive.<br>
<br>
Thanks<br>
<font color=3D"#888888">Yongqiang<br>
</font><div><div></div><div class=3D"h5">On Tue, Jul 19, 2011 at 1:55 PM, K=
oert Kuipers &lt;<a href=3D"mailto:koert@tresata.com">koert@tresata.com</a>=
&gt; wrote:<br>
&gt; thanks!<br>
&gt; i only see hive create the hashmap dump and perform mapjoin if both ta=
bles<br>
&gt; are small. i thought only one table needed to be small?<br>
&gt;<br>
&gt; i try to merge a very large table with a small table. i made sure<br>
&gt; hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are set larg=
e<br>
&gt; enough to accomodate the small table. yet hive does not attempt to do =
a<br>
&gt; mapjoin. does hive.mapjoin.maxsize also apply to big table? or do i ne=
ed to<br>
&gt; look at other parameters as well?<br>
&gt;<br>
&gt; On Tue, Jul 19, 2011 at 4:15 PM, yongqiang he &lt;<a href=3D"mailto:he=
yongqiangict@gmail.com">heyongqiangict@gmail.com</a>&gt;<br>
&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; in most cases, the mapjoin falls back to normal join because of on=
e of<br>
&gt;&gt; these three reasons:<br>
&gt;&gt; 1) the input table size is very big, so there will be no try on ma=
pjoin<br>
&gt;&gt; 2) if one of the input table is small (let&#39;s say less than 25M=
B which<br>
&gt;&gt; is configurable), hive will try a local hashmap dump. If it cause =
OOM<br>
&gt;&gt; on the client side when doing the local hashmap dump, it will go b=
ack<br>
&gt;&gt; normal join.The reason here is mostly due to very good compression=
 on<br>
&gt;&gt; the input data.<br>
&gt;&gt; 3) the mapjoin actually got started, and fails. it will fall back<=
br>
&gt;&gt; normal join. This will most unlikely happen<br>
&gt;&gt;<br>
&gt;&gt; Thanks<br>
&gt;&gt; Yongqiang<br>
&gt;&gt; On Tue, Jul 19, 2011 at 11:16 AM, Koert Kuipers &lt;<a href=3D"mai=
lto:koert@tresata.com">koert@tresata.com</a>&gt; wrote:<br>
&gt;&gt; &gt; note: this is somewhat a repost of something i posted on the =
CDH3 user<br>
&gt;&gt; &gt; group. apologies if that is not appropriate.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; i am exploring map-joins in hive. with hive.auto.convert.join=
=3Dtrue hive<br>
&gt;&gt; &gt; tries to do a map-join and then falls back on a mapreduce-joi=
n if<br>
&gt;&gt; &gt; certain<br>
&gt;&gt; &gt; conditions are not met. this sounds great. but when i do a<br=
>
&gt;&gt; &gt; query and i notice it falls back on a mapreduce-join, how can=
 i see<br>
&gt;&gt; &gt; which<br>
&gt;&gt; &gt; condition triggered the fallback (smalltablle.filesize or<br>
&gt;&gt; &gt; mapjoin.maxsize or<br>
&gt;&gt; &gt; something else perhaps memory related)?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; i tried reading the default log that a hive session produces,=
 but it<br>
&gt;&gt; &gt; seems<br>
&gt;&gt; &gt; more like a massive json file than a log to me, so it is very=
 hard for<br>
&gt;&gt; &gt; me to<br>
&gt;&gt; &gt; interpret that. i also turned on logging to console with debu=
gging,<br>
&gt;&gt; &gt; looking<br>
&gt;&gt; &gt; for any clues there but without luck so far. is the info ther=
e and am i<br>
&gt;&gt; &gt; just<br>
&gt;&gt; &gt; overlooking it? any ideas?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; thanks! koert<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;<br>
&gt;<br>
</div></div></blockquote></div><br>

--bcaec52c5c73b9823b04a8740b64--