Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 152 invoked from network); 23 Apr 2009 19:44:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Apr 2009 19:44:37 -0000 Received: (qmail 91378 invoked by uid 500); 23 Apr 2009 19:44:37 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 91327 invoked by uid 500); 23 Apr 2009 19:44:37 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 91318 invoked by uid 99); 23 Apr 2009 19:44:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2009 19:44:37 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.92.24] (HELO qw-out-2122.google.com) (74.125.92.24) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2009 19:44:28 +0000 Received: by qw-out-2122.google.com with SMTP id 3so539707qwe.35 for ; Thu, 23 Apr 2009 12:44:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.80.139 with SMTP id t11mr1815513qak.348.1240515847366; Thu, 23 Apr 2009 12:44:07 -0700 (PDT) Date: Thu, 23 Apr 2009 15:44:07 -0400 Message-ID: <42a94690904231244h30a05ddq286dbda1fe7d53c6@mail.gmail.com> Subject: Join Issue with Multiple Reducers From: Matt Pestritto To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd1fe406baa6804683e1afe X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd1fe406baa6804683e1afe Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi. I wanted to ask if anyone has seen the following behavior in Hive. When I execute a cross join ( join with no ON statement) across multiple reducers, I only get output = 1/ . E.g. I have 1 million rows in 1 table and 1 row in another table and do a join with no on statement. If this is executed with 5 reducers, I get 200k rows out instead of 1M. If I change the join to an outer join, I get 1M rows output but only 1/5 of the rows have values from the table in the join. select a.col1, b.col1 from tbl1 a join tbl2 b ; tbl1 has 1M records. tbl2 has 1 record. There are only 200K records output if run across 5 reducers. If I change to an outer join : select a.col1, b.col1 from tbl1 a left outer join tbl2 b ; There are 1M records output, but only 200K of them have a value for b.col1. The others have a null. My workaround is just running with 1 reducer, but it took me a while to figure this out. Thanks. -Matt --000e0cd1fe406baa6804683e1afe Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi.

I wanted to ask if anyone has seen the following behavior in Hiv= e.

When I execute a cross join ( join with no ON statement) across m= ultiple reducers, I only get output=A0 =3D 1/ <number of reducers>.= =A0 E.g.=A0 I have 1 million rows in 1 table and 1 row in another table and= do a join with no on statement.=A0 If this is executed with 5 reducers, I = get 200k rows out instead of 1M.=A0 If I change the join to an outer join, = I get 1M rows output but only 1/5 of the rows have values from the table in= the join.

select a.col1, b.col1 from tbl1 a join tbl2 b ;
tbl1 has 1M records= .
tbl2 has 1 record.
There are only 200K records output if run across= 5 reducers.

If I change to an outer join :
select a.col1, b.col= 1 from tbl1 a left outer join tbl2 b ;
There are 1M records output, but only 200K of them have a value for b.col1.= =A0 The others have a null.

My workaround is just running with 1 red= ucer, but it took me a while to figure this out.

Thanks.
-Matt --000e0cd1fe406baa6804683e1afe--