Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E7E19BDA for ; Sat, 3 Dec 2011 00:06:28 +0000 (UTC) Received: (qmail 58825 invoked by uid 500); 3 Dec 2011 00:06:28 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 58797 invoked by uid 500); 3 Dec 2011 00:06:28 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 58789 invoked by uid 99); 3 Dec 2011 00:06:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2011 00:06:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jieli@cs.duke.edu designates 152.3.140.1 as permitted sender) Received: from [152.3.140.1] (HELO duke.cs.duke.edu) (152.3.140.1) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2011 00:06:21 +0000 Received: from mail-iy0-f177.google.com (mail-iy0-f177.google.com [209.85.210.177]) (authenticated bits=0) by duke.cs.duke.edu (8.14.5/8.14.5) with ESMTP id pB305xdV015550 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL) for ; Fri, 2 Dec 2011 19:06:00 -0500 (EST) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu pB305xdV015550 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1322870760; bh=pqQ0MOYQNOj/T8t0sDirmLQE9BOc3Cuk5eS+rUz1s84=; h=MIME-Version:Date:Message-ID:Subject:From:To:Content-Type; b=OhRL3SwijJzpffq7ni/DshFHfJ0pGEDCqz8o3152hlj8Dg23COKNpa9WbHPwEPpvu NTZeID12ayFMzwmMJ4OB6qAR4JdDOANNmaMVpt20fyQko8K6LbeKd+WLhIonArgtth 0TBo8Fo5dC3eU48MMssxp99Gc5I57Qaa4kLDS+om6UwqobfvFlZAk5mo04LBEEPwmz uVsMqkpE+W6JuA+0fNr3F1PKAZb6dt1Ja9xB+FtKvr3MYvF0yl5fjSYsbWOxDVQrtV ZAisOVxuzi086pthQKROcXRlzRoKxHJUvd8vH7i84i7inNHlyse1SO6gpb3QE4mDwv oacYYh4SHnvbA== Received: by iadk27 with SMTP id k27so6579233iad.22 for ; Fri, 02 Dec 2011 16:05:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.163.200 with SMTP id d8mr447909icy.41.1322870759503; Fri, 02 Dec 2011 16:05:59 -0800 (PST) Received: by 10.231.153.6 with HTTP; Fri, 2 Dec 2011 16:05:59 -0800 (PST) Date: Fri, 2 Dec 2011 19:05:59 -0500 Message-ID: Subject: Early projection and lazy casting From: Jie Li To: dev@pig.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e89dcb438a004b324d9f0 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e89dcb438a004b324d9f0 Content-Type: text/plain; charset=ISO-8859-1 Hi all, We just figured out Pig 0.9.1 doesn't drop those non-necessary fields asap, which really affects the performance. Though http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html#loadfunc_loaderpushdownsaid that "As part of its optimizations Pig analyzes Pig Latin scripts and determines what fields in an input it needs at each step in the script. It uses this information to aggressively drop fields it no longer needs." We also found that Pig casts the data into the types defined in the schema, which is usually unnecessary, as most of them will be soon dropped. To work around these, we have to manually drop those fields and remove the types in the schema, which are really not interesting. Jie --90e6ba6e89dcb438a004b324d9f0--