Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4C4EDC2B for ; Wed, 12 Dec 2012 23:33:00 +0000 (UTC) Received: (qmail 36323 invoked by uid 500); 12 Dec 2012 23:32:59 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 36278 invoked by uid 500); 12 Dec 2012 23:32:59 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 36265 invoked by uid 99); 12 Dec 2012 23:32:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2012 23:32:59 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chen.song.82@gmail.com designates 209.85.223.177 as permitted sender) Received: from [209.85.223.177] (HELO mail-ie0-f177.google.com) (209.85.223.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Dec 2012 23:32:53 +0000 Received: by mail-ie0-f177.google.com with SMTP id k13so3123915iea.8 for ; Wed, 12 Dec 2012 15:32:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=8C6gLw9GcpA8+vHgdB7YpOGLhlx0xTed3bZlZzkyrKo=; b=eGTdQVauMNH9CJvJw//UPk/bUwLJVMrAd7jK7fgn6ZV5PLFoN2NUCY2H+Sv9D+z5w7 +4DjqjqcnqJdH1cmMSNIFZ4xw7YZucC1lWNOnQ6PDBRoF5QIdn4ooso3qKuEks0GqZFD x3LqZ1KmbZq/4khKJR2DLa3YHOANtzUYBQx/0yGYZKTQzMA+JtbLwXCmsXC2hhqVGXEV xaQWi4ZkV4fdr+tVNMEplsqBiV1MAJcVicur7JXVUoeRVgq0pkdOPDQzO9uvARKUZJJF 4SADRnLBM3sWZ0GEcTQ77qZAkABAEHiiCa4T1msgAateY01Jv6KM4y5LAEf/Tj/XdHWD /gjQ== MIME-Version: 1.0 Received: by 10.50.150.167 with SMTP id uj7mr15207195igb.33.1355355151870; Wed, 12 Dec 2012 15:32:31 -0800 (PST) Received: by 10.231.231.200 with HTTP; Wed, 12 Dec 2012 15:32:31 -0800 (PST) Date: Wed, 12 Dec 2012 18:32:31 -0500 Message-ID: Subject: map side join with group by From: Chen Song To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d04339d405f28e404d0b03626 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04339d405f28e404d0b03626 Content-Type: text/plain; charset=ISO-8859-1 I have a silly question on how Hive interpretes a simple query with both map side join and group by. Below query will translate into two jobs, with the 1st one as a map only job doing the join and storing the output in a intermediary location, and the 2nd one as a map-reduce job taking the output of the 1st job as input and doing the group by. SELECT /*+ MAPJOIN(d) */ table.a, sum(table2.b) from table LEFT OUTER JOIN table2 ON table.id = table2.id where hour = '2012-12-11 11' group by table.a Why can't this be done within a single map reduce job? As what I can see from the query plan is that all 2nd job mapper do is taking the 1st job's mapper output. -- Chen Song --f46d04339d405f28e404d0b03626 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have a silly question on how Hive interpretes a simple query with both ma= p side join and group by.

Below query will translate int= o two jobs, with the 1st one as a map only job doing the join and storing t= he output in a intermediary location, and the 2nd one=A0as a map-reduce job= =A0taking the output of the 1st job as input and doing the group by.

SELECT
/*+ MAPJOIN(d) */
= table.a, sum(table2.b)
from table
LEFT OUTER JOIN table= 2
where hour =3D '2012-12-11 11'
group by table.a

Why can't this be done within a single map reduc= e job? As what I can see from the query plan is that all 2nd job mapper do = is taking the 1st job's mapper output.

--
Chen Song


--f46d04339d405f28e404d0b03626--