pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai" <dai...@gmail.com>
Subject Re: Review Request 17878: Make scalar work
Date Mon, 17 Feb 2014 08:51:52 GMT


> On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote:
> > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java,
lines 772-773
> > <https://reviews.apache.org/r/17878/diff/2/?file=486610#file486610line772>
> >
> >     It would be easier to keep this here for curTezOp.getSplitParent() != null (no
multiquery)
> 
> Daniel Dai wrote:
>     Not sure if I understand, but seems there is nothing special for filter for splitter,
can you be more specific?
> 
> Rohini Palaniswamy wrote:
>     There is always a extra empty POFilter after POSplit. In MR it is removed in NoopFilterRemover.java.
It is easy/efficient to do it here as it saves on traversing the whole plan again and then
finding it and removing it.

I try to do it in MultiQueryOptimizerTez, there is a TODO tag for that. Actually we need to
check the filter condition, cuz not filter in splittee is redundant, right?


> On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote:
> > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java,
line 2110
> > <https://reviews.apache.org/r/17878/diff/2/?file=486610#file486610line2110>
> >
> >     You will need these, else there will be hanging operators in the plan
> 
> Daniel Dai wrote:
>     I also changed compile(PhysicalOperator op), which now handles split differently,
splitter will only be added once
> 
> Rohini Palaniswamy wrote:
>     I am not very sure. Can you check once if removing this does not introduce PIG-3620
again?

During my debugging, I manually examined bunch of plans, I am pretty sure it should not be
the case.


> On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote:
> > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java,
line 2112
> > <https://reviews.apache.org/r/17878/diff/2/?file=486610#file486610line2112>
> >
> >     You will need this as well, else input will be pointing to wrong physical operataors.
> 
> Daniel Dai wrote:
>     Checked seems to be Ok. Guess that is also due the change of how I generate the first
split. Splitter's physical plan is composed with PhysicalPlan.connect, which should deal with
inputs correctly.
> 
> Rohini Palaniswamy wrote:
>     I hit this problem with multiquery off. With multiquery on this was never a problem.

I can run e2e tests with mutiquery off to see.


> On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote:
> > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java,
line 172
> > <https://reviews.apache.org/r/17878/diff/2/?file=486612#file486612line172>
> >
> >     Might be easier if this optimizer is the first one to be called before Combiner
and secondary key optimizer. Just a thought. If there are no issues with having it later,
then fine.
> 
> Daniel Dai wrote:
>     This is follow MR version. In MR, multiquery and combiner optimization are mutually
exclusive (multiquery MR cannot have combiner), and we find combiner is more desired than
multiquery, so we run combiner optimization first. Fortunately that is not the case for Tez.
But I still want to keep the sequence of rules triggered to minimize Tez only issue.
> 
> Rohini Palaniswamy wrote:
>     Earlier I had fixed the combiner and secondary key optimizer of Tez to work with
splits. If you don't move it to beginning then you will have to fix MultiQueryOptimizer to
copy over combine plans and secondary sort orders in the TezEdgeDescriptor.

The good thing is combiner/secondary key property is on edge. When POSplit consume a vertex
A, I keep the original edge from A to its successor. So combiner/secondary key property is
still there.


- Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17878/#review34061
-----------------------------------------------------------


On Feb. 17, 2014, 4:25 a.m., Daniel Dai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17878/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 4:25 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3757
>     https://issues.apache.org/jira/browse/PIG-3757
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> See PIG-3757
> 
> 
> Diffs
> -----
> 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/CombinerPackager.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/MultiQueryOptimizerTez.java
PRE-CREATION 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecondaryKeyOptimizerTez.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
1567297 
>   branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java
1567297 
>   branches/tez/src/org/apache/pig/impl/builtin/ReadScalarsTez.java PRE-CREATION 
>   branches/tez/src/org/apache/pig/newplan/logical/visitor/ScalarVisitor.java 1567297

> 
> Diff: https://reviews.apache.org/r/17878/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Daniel Dai
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message