pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Bain" <ambclo...@gmail.com>
Subject Review Request 15219: PIG-3536 Implement DISTINCT for Pig-on-Tez
Date Tue, 05 Nov 2013 00:57:17 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for pig, Cheolsoo Park, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.

Bugs: PIG-3536

Repository: pig-git


Implement DISTINCT for Pig-on-Tez by providing a (very straightforward) implementation in

For the moment, this does NOT use two optimizations done in the MRCompiler. We will create
a separate JIRA for these optimizations:
1. A distinct combiner
2. A combiner optimizer that replaces certain uses of DISTINCT with an algebraic udf

[Little code note: I changed the name of getPlainForEach to getForEachPlain. That way we can
have getForEachHelper1, getForEachHelper2, etc. all follow alphabetically. Sorry if that's
a little too OCD.]


  src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java d62b2a1 
  test/e2e/pig/tests/tez.conf 24af8d3 
  test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld PRE-CREATION 
  test/org/apache/pig/tez/TestTezCompiler.java 1209d08 

Diff: https://reviews.apache.org/r/15219/diff/


This patch includes:
-A unit test in TestTezCompiler.java
-An e2e test

DANIEL: Can you check that my e2e test looks appropriate? I wasn't sure which test data set
to choose, I just picked studenttab20m.


Alex Bain

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message