pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park" <piaozhe...@gmail.com>
Subject Re: Review Request 19633: PIG-3743: Use VertexGroup and Alias vertex for union
Date Tue, 25 Mar 2014 21:43:59 GMT


> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > This will not handle union followed by store right? Can we create a separate jira
for that?

Correct. Like I said in the description, MROutput should be added directly to VertexGroup.

I will file a jira.


> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java,
lines 82-83
> > <https://reviews.apache.org/r/19633/diff/1/?file=535690#file535690line82>
> >
> >     Don't we have to iterate through all the values? Won't there be more than one
if shuffle edge is used and input grouped?

No, I don't think so. It should be handled by ConcatenatedMergedKeyValuesInput. If I am understanding
it correctly, calling reader.next() updates the underlying reader. So the following reader.getCurrentValues().iterator()
returns the new iterator if the previous reader is done.

In fact, I can confirm the result by testing union with multiple sources, e.g. 3+. It seems
correct.


> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, line 1979
> > <https://reviews.apache.org/r/19633/diff/1/?file=535691#file535691line1979>
> >
> >     Can you please add a TODO to use POValueOutputTez instead of POLocalRearrange
and unsorted shuffle with TEZ-661 and PIG-3775

Sure.


> On March 25, 2014, 9:27 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java, line 54
> > <https://reviews.apache.org/r/19633/diff/1/?file=535694#file535694line54>
> >
> >     hasVertexGroup()

I 100% agree hasVertexGroup() reads better, but since explain doesn't go through TezDagBuilder,
VertexGroup of tezOper is not set when TezPrinter traverses the plan.

Maybe I consolidate hasVertexGroup() and isUnion() into one, and then use it everywhere.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19633/#review38532
-----------------------------------------------------------


On March 25, 2014, 8:50 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19633/
> -----------------------------------------------------------
> 
> (Updated March 25, 2014, 8:50 p.m.)
> 
> 
> Review request for pig, Daniel Dai and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3743
>     https://issues.apache.org/jira/browse/PIG-3743
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> The patch reimplements union using Tez VertexGroup and GroupInputEdge.
> 
> The changes include-
> * Implemented POVertexGroupInputTez that takes ConcatenatedMergedKeyValuesInput from
VertexGroup.
> * TezCompiler inserts an alias vertex for union, and the alias vertex is converted to
VertexGroup by TezDagBuilder.
> * TezStats JobGraphBuilder removes alias vertices since they're not materialized by Tez,
and thus, there is no status for them.
> 
> Note that-
> * Further optimization is possible for the case where union is only followed by store.
In that case, we could directly attach a MROutput to VertexGroup instead of adding another
vertex that runs the MROutput. I'll follow up with this soon.
> * POLocalRearrangeTez is added to each union source because ConcatenatedMergedKeyValuesInput
expected ShuffledMergedInputs that requires sorting.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POUnionTezLoad.java e496ca8 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/POVertexGroupInputTez.java e69de29

>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 245cade 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java bce8963 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java b3aa020 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java f00946c 
>   src/org/apache/pig/tools/pigstats/tez/TezStats.java feac11d 
>   test/org/apache/pig/test/data/GoldenFiles/TEZC19.gld e69de29 
>   test/org/apache/pig/tez/TestTezCompiler.java e71d838 
> 
> Diff: https://reviews.apache.org/r/19633/diff/
> 
> 
> Testing
> -------
> 
> ant test-tez passes except TestTezCompiler (known issue).
> tez e2e tests all pass.
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message