Return-Path: Delivered-To: apmail-incubator-pig-dev-archive@locus.apache.org Received: (qmail 65573 invoked from network); 23 Jan 2008 15:52:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Jan 2008 15:52:09 -0000 Received: (qmail 84483 invoked by uid 500); 23 Jan 2008 15:51:59 -0000 Delivered-To: apmail-incubator-pig-dev-archive@incubator.apache.org Received: (qmail 84461 invoked by uid 500); 23 Jan 2008 15:51:59 -0000 Mailing-List: contact pig-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@incubator.apache.org Delivered-To: mailing list pig-dev@incubator.apache.org Received: (qmail 84450 invoked by uid 99); 23 Jan 2008 15:51:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2008 07:51:59 -0800 X-ASF-Spam-Status: No, hits=-96.7 required=10.0 tests=ALL_TRUSTED,FS_WILL_HELP X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2008 15:51:42 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 77DB4714187 for ; Wed, 23 Jan 2008 07:51:34 -0800 (PST) Message-ID: <30779866.1201103494487.JavaMail.jira@brutus> Date: Wed, 23 Jan 2008 07:51:34 -0800 (PST) From: "Shubham Chopra (JIRA)" To: pig-dev@incubator.apache.org Subject: [jira] Updated: (PIG-59) A new "ILLUSTRATE" command which will help people debug their pig programs In-Reply-To: <9810504.1200056194405.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chopra updated PIG-59: ------------------------------ Attachment: ExGenPatch Patch for the Example Generator. Contains implementation of the example generator algorithms and changes needed in PigHead to get it working. > A new "ILLUSTRATE" command which will help people debug their pig programs > -------------------------------------------------------------------------- > > Key: PIG-59 > URL: https://issues.apache.org/jira/browse/PIG-59 > Project: Pig > Issue Type: New Feature > Components: grunt > Reporter: Shubham Chopra > Attachments: ExGenPatch > > > I propose to add a new "ILLUSTRATE" command to Pig, which will help people debug their Pig programs. > The idea is to select a few example data items, and illustrate how they are transformed by the sequence of Pig commands in the user's program. I have an algorithm that can select an appropriate and concise set of example data items automatically. It does a better job than random sampling would do; for example, random sampling suffers from the drawback that selective operations such as filters or joins can eliminate *all* the sampled data items, giving you empty results which is of no help in debugging. > This "ILLUSTRATE" functionality will avoid people having to test their Pig programs on large data sets, which has a long turnaround time and wastes system resources. > Proposed Implementation: > I will create a new package called org.apache.pig.exgen, which will contain the aforementioned algorithm. The algorithm uses the "Local" execution operators (it does not run on hadoop), so as to generate illustrative example data in near-real-time for the user. > For my algorithm to work properly, it needs to trace the "lineage" (sometimes called "provenance") of data items as they flow through the local operator tree corresponding to the user's Pig program. So I will have to add a "lineage tracer" to the Local operators, which maintains a side data structure to represent the lineage, or derivation sequence, among data items. The lineage tracer will be DISABLED BY DEFAULT, so it will not affect normal Pig operation. > I will add a new method to PigServer called "PigServer.showExamples(LogicalPlan)", which will cause my exgen algorithm to be invoked. > I will also add a new command to Grunt, called ILLUSTRATE. Syntactically it will work the same way as the STORE command. For example, a user might type: > grunt> visits = load 'visits.txt' as (user, url, timestamp); > grunt> recent_visits = filter visits by timestamp >= '20071201'; > grunt> user_visits = group recent_visits by user; > grunt> num_user_visits = foreach user_visits generate group, COUNT(recent_visits); > grunt> illustrate num_user_visits > This would trigger my exgen algorithm, which will display something like: > visits: > (Amy, www.cnn.com, 20070218) > (Fred, www.harvard.edu, 20071204) > (Amy, www.bbc.com, 20071205) > (Fred, www.stanford.edu, 20071206) > recent_visits: > (Fred, www.harvard.edu, 20071204) > (Amy, www.bbc.com, 20071205) > (Fred, www.stanford.edu, 20071206) > user_visits: > (Fred, { (Fred, www.harvard.edu, 20071204), (Fred, www.stanford.edu, 20071206) } ) > (Amy, { (Amy, www.bbc.com, 20071205) } ) > num_user_visits: > (Fred, 2) > (Amy, 1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.