Return-Path: Delivered-To: apmail-hive-dev-archive@www.apache.org Received: (qmail 18498 invoked from network); 20 Mar 2011 14:56:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Mar 2011 14:56:53 -0000 Received: (qmail 13539 invoked by uid 500); 20 Mar 2011 14:56:53 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 13495 invoked by uid 500); 20 Mar 2011 14:56:52 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 13478 invoked by uid 99); 20 Mar 2011 14:56:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Mar 2011 14:56:52 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Mar 2011 14:56:47 +0000 Received: by fxm7 with SMTP id 7so6311689fxm.35 for ; Sun, 20 Mar 2011 07:56:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=/2MdOSHjT4IdZRBmaZmgOAe+qmiDgY/R+odkPIfQYHg=; b=N6qunXhxttLwVtETXyKcB+n9K7CH+FysoiSyshZ7Yj6mgIBP1Fh9JAZb9AXMCg+lyD Hds43b7xQVFsMYYbduwSEafwanTNBtlewmu7flrmA9HnI5CHUbE7SXIjMq92V4345Daa etEdTXC4pc9u75Pwd3VvvFEOC0Ci8GYXiXIUY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=migiMEzrdZxodR0IIKFwNMseXG1KZmyfY4/I8j+e+LUbW3aYeUmk8A43wFNr7xJXwS XKXivEhIQKO3AzkFstLlrOzxrNAoSPyuFGyhh3Upt9kl74Vb8rMHTUuXf5Jfpt9kbhsy WtW5DBgy0Jb/jSTvJNFplwDQ6RxrhizY4yNkE= MIME-Version: 1.0 Received: by 10.223.35.147 with SMTP id p19mr3738096fad.13.1300632987042; Sun, 20 Mar 2011 07:56:27 -0700 (PDT) Received: by 10.223.125.211 with HTTP; Sun, 20 Mar 2011 07:56:27 -0700 (PDT) In-Reply-To: References: Date: Sun, 20 Mar 2011 10:56:27 -0400 Message-ID: Subject: Re: skew join optimization From: Edward Capriolo To: user@hive.apache.org Cc: Ted Yu , dev@hive.apache.org, bharath vissapragada Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu wrote: > Can someone re-attach the missing figures for that wiki ? > > Thanks > > On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada > wrote: >> >> Hi Igor, >> >> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the >> jira 1642 which automatically converts a normal join into map-join >> (Otherwise you can specify the mapjoin hints in the query itself.). >> Because your 'S' table is very small , it can be replicated across all >> the mappers and the reduce phase can be avoided. This can greatly >> reduce the runtime .. (See the results section in the page for >> details.). >> >> Hope this helps. >> >> Thanks >> >> >> On Sun, Mar 20, 2011 at 6:37 PM, Jov wrote: >> > 2011/3/20 Igor Tatarinov : >> >> I have the following join that takes 4.5 hours (with 12 nodes) mostly >> >> because of a single reduce task that gets the bulk of the work: >> >> SELECT ... >> >> FROM T >> >> LEFT OUTER JOIN S >> >> ON T.timestamp =3D S.timestamp and T.id =3D S.id >> >> This is a 1:0/1 join so the size of the output is exactly the same as >> >> the >> >> size of T (500M records). S is actually very small (5K). >> >> I've tried: >> >> - switching the order of the join conditions >> >> - using a different hash function setting (jenkins instead of murmur) >> >> - using SET set hive.auto.convert.join =3D true; >> > >> > are you sure your query convert to mapjoin? if not,try use explicit >> > mapjoin hint. >> > >> > >> >> - using SET=A0hive.optimize.skewjoin =3D true; >> >> but nothing helped :( >> >> Anything else I can try? >> >> Thanks! >> > >> >> >> >> -- >> Regards, >> Bharath .V >> w:http://research.iiit.ac.in/~bharath.v > > The wiki does not allow images, confluence does but we have not moved their= yet.