Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E055D9A10 for ; Sat, 3 Dec 2011 00:46:26 +0000 (UTC) Received: (qmail 6349 invoked by uid 500); 3 Dec 2011 00:46:26 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 6325 invoked by uid 500); 3 Dec 2011 00:46:26 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 6317 invoked by uid 99); 3 Dec 2011 00:46:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2011 00:46:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jieli@cs.duke.edu designates 152.3.140.1 as permitted sender) Received: from [152.3.140.1] (HELO duke.cs.duke.edu) (152.3.140.1) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2011 00:46:19 +0000 Received: from mail-iy0-f177.google.com (mail-iy0-f177.google.com [209.85.210.177]) (authenticated bits=0) by duke.cs.duke.edu (8.14.5/8.14.5) with ESMTP id pB30jvjQ016535 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL) for ; Fri, 2 Dec 2011 19:45:58 -0500 (EST) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu pB30jvjQ016535 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1322873158; bh=0CQzfbH4wdHICb+tBQ+TMZFtctvJJ9AzPDyNnsN0Wh8=; h=MIME-Version:In-Reply-To:References:Date:Message-ID:Subject:From: To:Content-Type; b=K61ZHY8Lkputmsu7tzQoYXELxGd8wJiBxPAuerC+h7P4eKZzXvIg3ts+pzGBCvB+x VQtmzyKX/o1DgQZ6au1f5B1g1RuuHMSoC/y9UiqaWm1JoolVyVouxaRaGd0+8x5BTq AykOO++vmrszZpWNFnOBxhqcaUlHx+c1ME/HX1nJM1EX7ZYDyjOIx+BD5P/Q45wriI Pf0xxYca2EQ8X25ndKMRXmHbdYQyNOzRhdil7dQF1wMOh03hrjDwoQAmxnyDiJyGSm uj3HJNXUaRK3ybBVl16iw7xqvZ6eoKbPlClR9mPG9lhPkmgg2T35k7cr0F6lyAkHIO I+KdXI8ukZK6g== Received: by iadk27 with SMTP id k27so6635399iad.22 for ; Fri, 02 Dec 2011 16:45:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.28.194 with SMTP id n2mr136568ibc.54.1322873157795; Fri, 02 Dec 2011 16:45:57 -0800 (PST) Received: by 10.231.153.6 with HTTP; Fri, 2 Dec 2011 16:45:57 -0800 (PST) In-Reply-To: References: Date: Fri, 2 Dec 2011 19:45:57 -0500 Message-ID: Subject: Re: Early projection and lazy casting From: Jie Li To: dev@pig.apache.org Content-Type: multipart/alternative; boundary=001517740942a7406a04b3256890 X-Virus-Checked: Checked by ClamAV on apache.org --001517740942a7406a04b3256890 Content-Type: text/plain; charset=ISO-8859-1 Why do joins prevent the early projection? Actually join has the greatest need for it. Jie On Fri, Dec 2, 2011 at 7:33 PM, Jonathan Coveney wrote: > In what context? I always thought that it generally could, but that if you > do joins it doesn't. Would be curious to know more from someone who > knows... > > 2011/12/2 Jie Li > > > Hi all, > > > > We just figured out Pig 0.9.1 doesn't drop those non-necessary fields > asap, > > which really affects the performance. Though > > > > > http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid > > that "As part of its optimizations Pig analyzes Pig Latin scripts and > > determines what fields in an input it needs at each step in the script. > It > > uses this information to aggressively drop fields it no longer needs." > > > > We also found that Pig casts the data into the types defined in the > schema, > > which is usually unnecessary, as most of them will be soon dropped. > > > > To work around these, we have to manually drop those fields and remove > the > > types in the schema, which are really not interesting. > > > > Jie > > > --001517740942a7406a04b3256890--