Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A563A1078F for ; Mon, 17 Feb 2014 08:51:56 +0000 (UTC) Received: (qmail 59560 invoked by uid 500); 17 Feb 2014 08:51:55 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 59461 invoked by uid 500); 17 Feb 2014 08:51:53 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 59443 invoked by uid 99); 17 Feb 2014 08:51:52 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 08:51:52 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 2860F1C067D; Mon, 17 Feb 2014 08:51:52 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5476099100686157546==" MIME-Version: 1.0 Subject: Re: Review Request 17878: Make scalar work From: "Daniel Dai" To: "Rohini Palaniswamy" , "Cheolsoo Park" Cc: "pig" , "Daniel Dai" Date: Mon, 17 Feb 2014 08:51:52 -0000 Message-ID: <20140217085152.18781.6017@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Daniel Dai" X-ReviewGroup: pig X-ReviewRequest-URL: https://reviews.apache.org/r/17878/ X-Sender: "Daniel Dai" References: <20140217053045.18781.65156@reviews.apache.org> In-Reply-To: <20140217053045.18781.65156@reviews.apache.org> Reply-To: "Daniel Dai" X-ReviewRequest-Repository: pig --===============5476099100686157546== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit > On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote: > > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, lines 772-773 > > > > > > It would be easier to keep this here for curTezOp.getSplitParent() != null (no multiquery) > > Daniel Dai wrote: > Not sure if I understand, but seems there is nothing special for filter for splitter, can you be more specific? > > Rohini Palaniswamy wrote: > There is always a extra empty POFilter after POSplit. In MR it is removed in NoopFilterRemover.java. It is easy/efficient to do it here as it saves on traversing the whole plan again and then finding it and removing it. I try to do it in MultiQueryOptimizerTez, there is a TODO tag for that. Actually we need to check the filter condition, cuz not filter in splittee is redundant, right? > On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote: > > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, line 2110 > > > > > > You will need these, else there will be hanging operators in the plan > > Daniel Dai wrote: > I also changed compile(PhysicalOperator op), which now handles split differently, splitter will only be added once > > Rohini Palaniswamy wrote: > I am not very sure. Can you check once if removing this does not introduce PIG-3620 again? During my debugging, I manually examined bunch of plans, I am pretty sure it should not be the case. > On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote: > > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, line 2112 > > > > > > You will need this as well, else input will be pointing to wrong physical operataors. > > Daniel Dai wrote: > Checked seems to be Ok. Guess that is also due the change of how I generate the first split. Splitter's physical plan is composed with PhysicalPlan.connect, which should deal with inputs correctly. > > Rohini Palaniswamy wrote: > I hit this problem with multiquery off. With multiquery on this was never a problem. I can run e2e tests with mutiquery off to see. > On Feb. 17, 2014, 5:30 a.m., Rohini Palaniswamy wrote: > > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java, line 172 > > > > > > Might be easier if this optimizer is the first one to be called before Combiner and secondary key optimizer. Just a thought. If there are no issues with having it later, then fine. > > Daniel Dai wrote: > This is follow MR version. In MR, multiquery and combiner optimization are mutually exclusive (multiquery MR cannot have combiner), and we find combiner is more desired than multiquery, so we run combiner optimization first. Fortunately that is not the case for Tez. But I still want to keep the sequence of rules triggered to minimize Tez only issue. > > Rohini Palaniswamy wrote: > Earlier I had fixed the combiner and secondary key optimizer of Tez to work with splits. If you don't move it to beginning then you will have to fix MultiQueryOptimizer to copy over combine plans and secondary sort orders in the TezEdgeDescriptor. The good thing is combiner/secondary key property is on edge. When POSplit consume a vertex A, I keep the original edge from A to its successor. So combiner/secondary key property is still there. - Daniel ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17878/#review34061 ----------------------------------------------------------- On Feb. 17, 2014, 4:25 a.m., Daniel Dai wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17878/ > ----------------------------------------------------------- > > (Updated Feb. 17, 2014, 4:25 a.m.) > > > Review request for pig, Cheolsoo Park and Rohini Palaniswamy. > > > Bugs: PIG-3757 > https://issues.apache.org/jira/browse/PIG-3757 > > > Repository: pig > > > Description > ------- > > See PIG-3757 > > > Diffs > ----- > > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/CombinerPackager.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/MultiQueryOptimizerTez.java PRE-CREATION > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecondaryKeyOptimizerTez.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java 1567297 > branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java 1567297 > branches/tez/src/org/apache/pig/impl/builtin/ReadScalarsTez.java PRE-CREATION > branches/tez/src/org/apache/pig/newplan/logical/visitor/ScalarVisitor.java 1567297 > > Diff: https://reviews.apache.org/r/17878/diff/ > > > Testing > ------- > > > Thanks, > > Daniel Dai > > --===============5476099100686157546==--