pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy" <rohini.adi...@gmail.com>
Subject Review Request 35491: PIG-4574: Eliminate identity vertex for order by and skewed join right after LOAD
Date Tue, 16 Jun 2015 07:20:00 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35491/
-----------------------------------------------------------

Review request for pig.


Bugs: PIG-4574
    https://issues.apache.org/jira/browse/PIG-4574


Repository: pig


Description
-------

Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of getting from
sampler vertex.

This jira does not optimize the case of 

A = LOAD 'x' ...;
B = LOAD 'y' ...;
C = UNION A, B;
D = ORDER C BY ..;

This depends on UnionOptimizer being turned on and will need more changes. So will leave this
for another jira.


Diffs
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POIdentityInOutTez.java
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Limit-2.gld
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-1.gld
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Order-2.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-1.gld
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-SkewJoin-2.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16-OPTOFF.gld
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-Union-16.gld
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezAutoParallelism.java
1685498 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1685498


Diff: https://reviews.apache.org/r/35491/diff/


Testing
-------

Ran subset of e2e tests - SkewedJoin,Union,Order,MultiQuery_Self,MultiQuery_Union

Ran L9.pig. Before the patch

File System Counters
		FILE_BYTES_READ=2028282366911
		FILE_BYTES_WRITTEN=4049785379197
		HDFS_BYTES_READ=1011533488395
		HDFS_BYTES_WRITTEN=1010554380555
        
After the patch

File System Counters
                FILE_BYTES_READ=1007449863330
                FILE_BYTES_WRITTEN=2016036957653
                HDFS_BYTES_READ=2023066976790
                HDFS_BYTES_WRITTEN=1010554380555


Thanks,

Rohini Palaniswamy


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message