hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Pig | Yahoo! Research
Date Mon, 30 Apr 2007 18:59:46 GMT
I haven't been reading this list like I should...

Pig is meant to provide a more powerful and simple abstraction for writing 
distributed processing logic than mapreduce by itself provides. 


- We support joins.
- We provide the ability to select fields from records that will be passed to 
a function so that general functions can be written and reused.
- We do some amount of optimization at the moment (more in the future) to 
reduce the number of actual jobs that get run.


- The model has one kind of function: an eval function. It processes one 
record at a time. A dataset can have records grouped together or sorted or 
filtered or have a projection applied to it, but functions just need to work 
on one record at a time. If a=load 'dataset', MAP is foreach a generate 
map(*) and REDUCE is b=group a by $0; foreach b generate reduce(*)
- Since Pig Latin is a simple language that can be used directly. We have a 
simple interpreter called GRUNT that users can interact with to submit jobs.
- Eventually we would like to embed Pig Latin into Perl, Python and Ruby to 
create Erlpay, Ythonpay and Ubyray, but we are a bit low on developer 
bandwidth. We believe that by embedding Pig Latin into existing languages we 
would end up with a much more powerful, well know, and natural environment to 
work in as opposed to creating our own language like Sawzall.


On Thursday 26 April 2007 15:17:24 Ian Holsman wrote:
> Jim Kellerman wrote:
> > Can someone comment on how Pig compares with Bigtable?
> >
> > On Thu, 2007-04-26 at 13:10 -0700, Doug Cutting wrote:
> >> FYI
> >>
> >> http://research.yahoo.com/project/pig
> >>
> >> Doug
> my understanding is
> bigtable/hbase stores the data
> mapreduce/hadoop manipulates/creates the data to be stored in bigtable
> via functions, and controls the distribution
> sawzall/pig is a query language to extract information from it. I think
> it would use create functions for mapreduce/hadoop to run.
> regards
> Ian

View raw message