Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 16424 invoked from network); 11 Nov 2009 05:07:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Nov 2009 05:07:45 -0000 Received: (qmail 30377 invoked by uid 500); 11 Nov 2009 05:07:45 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 30280 invoked by uid 500); 11 Nov 2009 05:07:43 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 30259 invoked by uid 99); 11 Nov 2009 05:07:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Nov 2009 05:07:42 +0000 X-ASF-Spam-Status: No, hits=-1.8 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of njain@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-snc1.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Nov 2009 05:07:34 +0000 Received: from mail.thefacebook.com (intlb01.snat.snc1.facebook.com [10.128.203.16] (may be forged)) by pp01.snc1.tfbnw.net (8.14.1/8.14.1) with ESMTP id nAB573DC001949 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Tue, 10 Nov 2009 21:07:03 -0800 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.102]) by sc-hub01.TheFacebook.com ([192.168.18.104]) with mapi; Tue, 10 Nov 2009 21:07:11 -0800 From: Namit Jain To: "hive-user@hadoop.apache.org" CC: Ryan LeCompte Date: Tue, 10 Nov 2009 21:07:07 -0800 Subject: Re: Self join problem Thread-Topic: Self join problem Thread-Index: Acpid8JPLZN02wYSRkqqvrVwT1u4RwAFQ5k4 Message-ID: In-Reply-To: <55e68b790911101835i7577eb5dt41659a039e20ae6c@mail.gmail.com> Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C71F867C10D9Dnjainfacebookcom_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2009-11-11_02:2009-10-29,2009-11-11,2009-11-10 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=5.0.0-0908210000 definitions=main-0911100365 X-Virus-Checked: Checked by ClamAV on apache.org --_000_C71F867C10D9Dnjainfacebookcom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I think you missed the attachment. Which job is taking more time - join or group by ? Can you send the data characteristics for m1 and foo1 - is it possible that= there is a large skew on aid and dt which is forcing the data to be send t= o a single reducer -namit On 11/10/09 6:35 PM, "Defenestrator" wrote: I would definitely appreciate any insights on this from the list. I tried = to reduce the query down to something that is easily understood and hive st= ill demonstrates a pretty poor join performance behavior on a three-node ha= doop cluster. drop table m1; drop table foo1; create table m1 ( mid int, aid int, dt string); LOAD DATA LOCAL INPATH 'm1' OVERWRITE INTO TABLE m1; create table foo1 ( aid_1 int, aid_2 int, mid bigint, dt bigint ); set mapred.reduce.tasks=3D32; insert overwrite table foo1 select m1.aid as aid_1, m2.aid as aid_2, count(1), m1.dt as dt from m1 m1 join m1 m2 on m1.aid =3D m2.aid and m1.dt =3D m2.dt group by m1.= aid, m2.aid, m1.dt; Attached is the file I'm using that only has 100k rows. I've looked at the= benchmark (http://issues.apache.org/jira/secure/attachment/12411185/hive_b= enchmark_2009-06-18.pdf) and hive seems to be able to join much bigger data= sets. And I tried running the same query on a single node dbms on my desk= top, and it's able to return results in less than 3-minutes. While hive ha= s been running for at least 20 minutes now. Thanks. On Tue, Nov 10, 2009 at 3:53 PM, Ryan LeCompte wrote: Any thoughts on this? I've only had luck by reducing the data on each side = of the join. Is this something Hive might be able to improve in a future re= lease of the query plan optimization? Thanks, Ryan On Nov 3, 2009, at 10:55 PM, Ryan LeCompte wrote: I've had a similar issue with a small cluster. Is there any way that you ca= n reduce the size of the data being joined on both sides? If you search the= forums for join issue, you will see the thread for my issue and get some t= ips. Thanks, Ryan On Nov 3, 2009, at 10:45 PM, Defenestrator < defenestrationism@gmail.com> wro= te: I was able to increase the number of reduce jobs manually to 32. However, = it finishes 28 of them and the other 4 has the same behavior of using 100% = cpu and consuming a lot of memory. I'm suspecting that it might be an issu= e with the reduce job itself - is there a way to figure out what these jobs= are doing exactly? Thanks. On Tue, Nov 3, 2009 at 6:53 PM, Namit Jain < njain@facebook.com> wrote: The number of reducers are inferred from the input data size. But, you can = always overwrite it by setting mapred.reduce.tasks From: Defenestrator [mailto: defenestrationism@gmail.com] Sent: Tuesday, November 03, 2009 6:46 PM To: hive-user@hadoop.apache.org Subject: Re: Self join problem Hi Namit, Thanks for your suggestion. I tried changing the query as you had suggested by moving the m1.dt =3D m2.= dt to the on clause. It increased the number of reduce jobs to 2. So now = there are two processes running on two nodes at 100% consuming a lot of mem= ory. Is there a reason why hive doesn't spawn more reduce jobs for this qu= ery? On Tue, Nov 3, 2009 at 4:47 PM, Namit Jain < njain@facebook.com> wrote: Get the join condition in the on condition: insert overwrite table foo1 select m1.id as id_1, m2.id as id_2, count(1), m1.dt from m1 join m2 on m1.dt=3Dm2.dt where m1.id <= > m2.id and m1.id < m2.id group by m1.id , m2.id , m1.dt; From: Defenestrator [mailto: defenestrationism@gmail.com] Sent: Tuesday, November 03, 2009 4:44 PM To: hive-user@hadoop.apache.org Subject: Self join problem Hello, I'm trying to run the following query where m1 and m2 have the same data (>= 29M rows) on a 3-node hadoop cluster. I'm essentially trying to do a self = join. It ends up running 269 map jobs and 1 reduce job. The map jobs comp= lete but the reduce job just runs on one process on one of the hadoop nodes= at 100% cpu utilization and just slowly increases in memory consumption. = The reduce job never goes beyond 82% complete despite letting it run for a = day. I am running on 0.5.0 based on this morning's trunk. insert overwrite table foo1 select m1.id as id_1, m2.id as id_2, count(1), m1.dt from m1 join m2 where m1.id <> m2.id and m1.id < m2.id and m1.dt =3D m2.dt group by m1.id , m2.id , m1.dt; Any input would be appreciated. --_000_C71F867C10D9Dnjainfacebookcom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Re: Self join problem I think you missed the attachment.


Which job is taking more time – join or group by ?

Can you send the data characteristics for m1 and foo1 – is it possibl= e that there is a large skew on aid and dt which is forcing the data to be = send to a single reducer



-namit



On 11/10/09 6:35 PM, "Defenestrator" <defenestrationism@gmail.= com> wrote:

I would definitely = appreciate any insights on this from the list. =A0I tried to reduce the que= ry down to something that is easily understood and hive still demonstrates = a pretty poor join performance behavior on a three-node hadoop cluster.

drop table m1;
drop table foo1;

create table m1 (
mid int,
aid int,
dt string);

LOAD DATA LOCAL INPATH 'm1' OVERWRITE INTO TABLE m1;

create table foo1 (
aid_1 int,
aid_2 int,
mid bigint,
dt bigint
);

set mapred.reduce.tasks=3D32;

insert overwrite table foo1
select m1.aid as aid_1, m2.aid as aid_2, count(1), m1.dt as dt
from m1 m1 join m1 m2 on m1.aid =3D m2.aid and m1.dt =3D m2.dt group by m1.= aid, m2.aid, m1.dt;

Attached is the file I'm using that only has 100k rows. =A0I've looked at t= he benchmark (http://issues.apache.org/jira/secure= /attachment/12411185/hive_benchmark_2009-06-18.pdf) and hive seems to b= e able to join much bigger data sets. =A0And I tried running the same query= on a single node dbms on my desktop, and it's able to return results in le= ss than 3-minutes. =A0While hive has been running for at least 20 minutes n= ow.

Thanks.

On Tue, Nov 10, 2009 at 3:53 PM, Ryan LeCompte <lecompte@gmail.com> w= rote:
Any thoughts on thi= s? I've only had luck by reducing the data on each side of the join. Is thi= s something Hive might be able to improve in a future release of the query = plan optimization?

Thanks,
Ryan



On Nov 3, 2009, at 10:55 PM, Ryan LeCompte <lecompte@gmail.com> wrote= :

I've had a similar = issue with a small cluster. Is there any way that you can reduce the size o= f the data being joined on both sides? If you search the forums for join is= sue, you will see the thread for my issue and get some tips.=A0

Thanks,
Ryan



On Nov 3, 2009, at 10:45 PM, Defenestrator < <mailto:defenestrationism@gmail.com> defenest= rationism@gmail.com> wrote:

I was able to incre= ase the number of reduce jobs manually to 32. =A0However, it finishes 28 of= them and the other 4 has the same behavior of using 100% cpu and consuming= a lot of memory. =A0I'm suspecting that it might be an issue with the redu= ce job itself - is there a way to figure out what these jobs are doing exac= tly?

Thanks.

On Tue, Nov 3, 2009 at 6:53 PM, Namit Jain < <mailto:njain@facebook.com>  <mailto:njain@facebook.com> njain@facebook.com&= gt; wrote:
The number of reducers are inferred from the input data size. But, you= can always overwrite it by setting mapred.reduce.tasks
=A0
=A0
=A0

From: Defenestrator [mailto: &= lt;mailto:defenestrationism@gmail.com>  <mailto:defenestrationis= m@gmail.com> defenestrationism@gmail.com]
Sent: Tuesday, November 03, 2009 6:46 PM

To:  <mailto:= hive-user@hadoop.apache.org>  <mailto:hive-user@hadoop.apache.org> hive-user@hado= op.apache.org
Subject: Re: Self join problem

=A0
Hi Namit,

=A0

Thanks for your suggestion.

=A0

I tried changing the query as you had suggested by moving the m1.dt =3D m2.= dt to the on clause. =A0It increased the number of reduce jobs to 2. =A0So = now there are two processes running on two nodes at 100% consuming a lot of= memory. =A0Is there a reason why hive doesn't spawn more reduce jobs for t= his query?

=A0

On Tue, Nov 3, 2009 at 4:47 PM, Namit Jain < <mailto:njain@facebook.com>  <mailto:njain@facebook.com> njain@facebook.com&= gt; wrote:

Get the join = condition in the on condition:

=A0
insert overwrite table foo1 select m1.id <http://m1.id>  as id_= 1, m2.id <http://m2.id>  as id_2, c= ount(1), m1.dt
from m1 join m2 on m1.dt=3Dm2.dt where m1.id <h= ttp://m1.id> <> m2.id <http://m2.i= d>  and m1.id <http://m1.id>=  < m2.id <http://m2.id>  g= roup by m1.id <http://m1.id> , m2.id <http://m2.id> , m1.dt;
=A0
=A0
=A0

From: Defenestrator [mailto: <mailto:defenestrationism@gmail.com&= gt;  <mailto:defenestrationism@gmail.com> defenestrationism@gmai= l.com]
Sent: Tuesday, November 03, 2009 4:44 PM
To:  <mailto:= hive-user@hadoop.apache.org>  <mailto:hive-user@hadoop.apache.org> hive-user@hado= op.apache.org
Subject: Self join problem

=A0

Hello,

=A0

I'm trying to run the following query where m1 and m2 have the same data (&= gt;29M rows) on a 3-node hadoop cluster. =A0I'm essentially trying to do a = self join. =A0It ends up running 269 map jobs and 1 reduce job. =A0The map = jobs complete but the reduce job just runs on one process on one of the had= oop nodes at 100% cpu utilization and just slowly increases in memory consu= mption. =A0The reduce job never goes beyond 82% complete despite letting it= run for a day.

=A0

I am running on 0.5.0 based on this morning's trunk.

=A0

insert overwrite table foo1

select m1.id <http://m1.id>  as id_= 1, m2.id <http://m2.id>  as id_2, c= ount(1), m1.dt

from m1 join m2 where m1.id <http://m1.id>  <> m2.id <http://m2.id> &= nbsp;and m1.id <http://m1.id>  <= m2.id <http://m2.id> and m1.dt =3D m2.d= t group by m1.id <http://m1.id> , m2.id = <http://m2.id> , m1.dt;

=A0

Any input would be appreciated.
=A0



--_000_C71F867C10D9Dnjainfacebookcom_--