pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "PigLatin" by OlgaN
Date Tue, 16 Oct 2007 20:28:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigLatin

------------------------------------------------------------------------------
  = Introduction to Pig Latin =
+ 
+ [[TableOfContents]]
  
  So you want to learn Pig Latin. Welcome! Lets begin with the data types.
  
@@ -13, +15 @@

   * A '''Data Bag''' is a set of tuples (duplicate tuples are allowed). You may think of
it as a "table", except that Pig does not require that the tuple field  types match, or even
that the tuples have the same number of fields! (It is up to you whether you want these properties.)
We denote bags by { } bracketing. Thus, a data bag could be {<apache.org,1.0>, <flickr.com,0.8>}
   * A '''Data Map''' is a map from keys that are string literals to values that can be any
data type. Think of it as a !HashMap<String,X> where X can be any of the 4 pig data
types. A Data Map supports the expected get and put interface. We denote maps by [ ] bracketing,
with ":" separating the key and the value, and ";" separating successive key value pairs.
Thus. a data map could be [ 'apache' : <'pig', 'hadoop'> ; 'cnn' : 'news' ]. Here, the
key 'apache' is mapped to the tuple with 2 atomic fields 'pig' and 'hadoop', while the key
'cnn' is mapped to the data atom 'news'.
  
- #DataItems
  == Data Items ==
  Data can be referred to in various powerful and convenient ways in Pig. Any data referred
to is called a Data Item. We will illustrate all these ways by using the following example
tuple.
  
@@ -28, +29 @@

  || Field referred to by position || $0 || Data Atom '1' || In Pig, positions start at 0
and not 1 ||
  || Field referred to by name || f2 || Bag {<2,3>,<4,6>,<5,7>} || ||
  || Projection of another data item || f2.$0 || Bag {<2>,<4>,<5>} - the
bag f2 projected to the first field || ||
- || Map Lookup against another data item || f3#'apache' || Data Atom 'pig' || User's responsibility
to ensure that a lookup is written only against a  data map, otherwise a runtime error is
thrown. If the key being looked up does not exist, a Data Atom with an empty string is returned
||
+ || Map Lookup against another data item || f3#'apache' || Data Atom 'pig' || * User's responsibility
to ensure that a lookup is written only against a  data map, otherwise a runtime error is
thrown. [[BR]] * If the key being looked up does not exist, a Data Atom with an empty string
is returned ||
  || Function applied to another data item || SUM(f2.$0) || 2+4+5 = 11 || SUM is a builtin
Pig function. See PigFunctions for how to write your own functions ||
  || Infix Expression of other data items || COUNT(f2) + f1 / '2.0' || 3 + 1 / 2.0 = 3.5 ||
||
  || Bincond, i.e., the value of the data item is chosen according to some condition ||(f1
= =  '1' ? '2' : COUNT(f2))|| '2' since f1=='1' is true. If f1 were != '1', then the value
of this data item for t would be COUNT(f2)=3 || See [#CondS Conditions] for what the format
of the condition in the bincond can be ||
@@ -43, +44 @@

  
  `grunt> A = load 'data' using PigStorage() as (x, y, z);`
  `grunt>B = group A by x;`
- `grunt> C = foreach B {`
+ `grunt> C = foreach B {`[[BR]]
- 
- `D = distinct A.y;`
+ `D = distinct A.y;` [[BR]]
- 
- `generate flatten(group), COUNT(D);`
+ `generate flatten(group), COUNT(D);` [[BR]]
+ `}`[[BR]]
- 
- `}`
  `grunt>` 
  
+ 

Mime
View raw message