incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "PigProposal" by ChrisOlston
Date Fri, 14 Sep 2007 00:26:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The following page has been changed by ChrisOlston:

  == Proposal ==
- Pig consists of a language and an interactive shell. Pig's language, Pig Latin, is a simple
query algebra that lets you express data transformations such as merging data sets, filtering
them, and applying functions to records or groups of records. 
+ The Pig project consists of high-level languages for expressing data analysis programs,
coupled with infrastructure for evaluating these programs. The salient property of Pig programs
is that their structure is amenable to substantial parallelization, which in turns enables
them to handle very large data sets.
- Pig Latin has several key properties:
+ At the present time, Pig's infrastructure layer consists of a compiler that produces sequences
of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g.,
the Hadoop project). Pig's language layer currently consists of a textual language called
Pig Latin, which has the following key properties:
   1. ''Ease of programming''. It is trivial to achieve parallel execution of simple, "embarrassingly
parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations
are explicitly encoded as data flow sequences, making them easy to write, understand, and
   2. ''Optimization opportunities''. The way in which tasks are encoded permits the system
to optimize their execution automatically, allowing the user to focus on semantics rather
than efficiency.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message