Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 83714 invoked from network); 30 Apr 2007 19:00:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Apr 2007 19:00:45 -0000 Received: (qmail 34715 invoked by uid 500); 30 Apr 2007 19:00:49 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 34697 invoked by uid 500); 30 Apr 2007 19:00:49 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 34688 invoked by uid 99); 30 Apr 2007 19:00:49 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2007 12:00:49 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [207.126.228.149] (HELO rsmtp1.corp.yahoo.com) (207.126.228.149) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2007 12:00:41 -0700 Received: from trustcare.corp.yahoo.com (trustcare.corp.yahoo.com [10.72.114.192]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l3UJ0DOc043258 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 30 Apr 2007 12:00:14 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=from:to:subject:date:user-agent:cc:references:in-reply-to: mime-version:content-type:content-transfer-encoding: content-disposition:message-id; b=o8oFxsTvrsoQ/8QA3orO8xltIesR6OQlHObnEKAdVyNgJrxAQkPR2RVzFx5g81N1 From: Benjamin Reed To: hadoop-user@lucene.apache.org Subject: Re: Pig | Yahoo! Research Date: Mon, 30 Apr 2007 11:59:46 -0700 User-Agent: KMail/1.9.6 Cc: Ian Holsman References: <46310736.2080100@apache.org> <1177625099.5807.0.camel@vermin.localdomain> <463124F4.7080406@holsman.net> In-Reply-To: <463124F4.7080406@holsman.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200704301159.47010.breed@yahoo-inc.com> X-Virus-Checked: Checked by ClamAV on apache.org I haven't been reading this list like I should... Pig is meant to provide a more powerful and simple abstraction for writing distributed processing logic than mapreduce by itself provides. Power: - We support joins. - We provide the ability to select fields from records that will be passed to a function so that general functions can be written and reused. - We do some amount of optimization at the moment (more in the future) to reduce the number of actual jobs that get run. Simplicity: - The model has one kind of function: an eval function. It processes one record at a time. A dataset can have records grouped together or sorted or filtered or have a projection applied to it, but functions just need to work on one record at a time. If a=load 'dataset', MAP is foreach a generate map(*) and REDUCE is b=group a by $0; foreach b generate reduce(*) - Since Pig Latin is a simple language that can be used directly. We have a simple interpreter called GRUNT that users can interact with to submit jobs. - Eventually we would like to embed Pig Latin into Perl, Python and Ruby to create Erlpay, Ythonpay and Ubyray, but we are a bit low on developer bandwidth. We believe that by embedding Pig Latin into existing languages we would end up with a much more powerful, well know, and natural environment to work in as opposed to creating our own language like Sawzall. ben On Thursday 26 April 2007 15:17:24 Ian Holsman wrote: > Jim Kellerman wrote: > > Can someone comment on how Pig compares with Bigtable? > > > > On Thu, 2007-04-26 at 13:10 -0700, Doug Cutting wrote: > >> FYI > >> > >> http://research.yahoo.com/project/pig > >> > >> Doug > > my understanding is > > bigtable/hbase stores the data > mapreduce/hadoop manipulates/creates the data to be stored in bigtable > via functions, and controls the distribution > sawzall/pig is a query language to extract information from it. I think > it would use create functions for mapreduce/hadoop to run. > > regards > Ian