pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park" <piaozhe...@gmail.com>
Subject Re: Review Request 16313: PIG-3604 Implement replicated join in Tez
Date Wed, 18 Dec 2013 03:17:40 GMT


> On Dec. 17, 2013, 3:52 p.m., Rohini Palaniswamy wrote:
> > test/org/apache/pig/tez/TestTezCompiler.java, line 216
> > <https://reviews.apache.org/r/16313/diff/1/?file=398711#file398711line216>
> >
> >     Can we add cases for
> >      - three or four way join?
> >      - replicated table is part of a reduce output instead of being loaded directly.
This is to handle the case where you don't create a separate vertex to broadcast, but broadcast
from a existing vertex (POLocalRearrange) just changing the edge type to broadcast. Don't
think the TezCompiler handles this now.

I has just realized that I misunderstood the 2nd point. In my new patch, I handles the case
where *fragmented* table is a predecessor's output and replicated join happens in reducer.
I don't handle the case where *replicated* table is a predecessor's output yet. Can I handle
it in a separate jira?


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16313/#review30533
-----------------------------------------------------------


On Dec. 18, 2013, 3:04 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16313/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2013, 3:04 a.m.)
> 
> 
> Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3604
>     https://issues.apache.org/jira/browse/PIG-3604
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Implemented replicated join in Tez as follows:
> - POFRJoinTez extends POFRJoin. The difference between two is that replication hash table
is constructed out of broadcasting edges in Tez instead of files on distributed cache in MR.
> - TezCompiler adds a vertex per replicated table and connect it to POFRJoin vertex via
broadcasting edge.
> 
> Note that in POLocalRerrangeTez, I package tuples in the same way for broadcast and scatter/gather
edges, so I removed outputType (DataMovementType). 
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
d7c54d8 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
e900751 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POFRJoinTez.java e69de29 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java cda5d89

>   src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java d76cfc5

>   src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e6f9be5 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 7a1736a 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 2584501 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 96ccdde 
>   test/e2e/pig/tests/tez.conf b280698 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC10.gld e69de29 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC11.gld e69de29 
>   test/org/apache/pig/tez/TestTezCompiler.java 79dc94e 
> 
> Diff: https://reviews.apache.org/r/16313/diff/
> 
> 
> Testing
> -------
> 
> Added a unit test case to TestTezCompiler.
> Added a e2e test case to Join.
> 
> ant test-tez passes.
> e2e test passes.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message