Return-Path: X-Original-To: apmail-pig-dev-archive@www.apache.org Delivered-To: apmail-pig-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1AD14DAFC for ; Fri, 17 Aug 2012 20:44:00 +0000 (UTC) Received: (qmail 91410 invoked by uid 500); 17 Aug 2012 20:43:59 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 91383 invoked by uid 500); 17 Aug 2012 20:43:59 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 91371 invoked by uid 99); 17 Aug 2012 20:43:59 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 20:43:59 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 34D9D1C1B94; Fri, 17 Aug 2012 20:43:58 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3579780589398361828==" MIME-Version: 1.0 Subject: Re: Review Request: PIG-2831: MR-Cube implementation (Distributed cubing for holistic measures) From: "Dmitriy Ryaboy" To: "Dmitriy Ryaboy" Cc: j.prasanth.j@gmail.com, "pig" Date: Fri, 17 Aug 2012 20:43:58 -0000 Message-ID: <20120817204358.23840.58013@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Dmitriy Ryaboy" X-ReviewGroup: pig X-ReviewRequest-URL: https://reviews.apache.org/r/6651/ X-Sender: "Dmitriy Ryaboy" References: <20120816221913.23841.37552@reviews.apache.org> In-Reply-To: <20120816221913.23841.37552@reviews.apache.org> Reply-To: "Dmitriy Ryaboy" --===============3579780589398361828== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6651/#review10473 ----------------------------------------------------------- Publishing the first set of comments (got through to MapReduceOper). More l= ater. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControl= Compiler.java does this break if we have 2 cube operators? should we add some identif= ier to the filename? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControl= Compiler.java no, let's rely on the general estimator framework rather than special-c= asing. Having separate chunks of code dealing with this through the codebas= e will lead to confusion. Now that we allow passing a full plan into the r= educer estimator, it will be possible to write an estimator that walks the = plan and uses factors based on knowledge of inputs and operators that work = on the input to adjust the naive estimation; we should encode any cube oper= ator-specific logic into such an estimator. = I think you can go ahead and remove this comment block. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControl= Compiler.java kill dead code. = let's note that the minimal number of reducers should be the max partit= ion value, and how to get it, in POCube documentation. That way, whoever im= plements an improved estimator can follow this advice when they encounter P= OCube. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java you are adding a lot of code to this class. At 2800 lines, it's already= too big a piece of code. = perhaps some of it can be factored out into CubeCompiler or something s= imilar? = src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java break this code out into its own function. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java can you add a comment describing what the plan looks like before and af= ter this is applied? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java COUNT.class.getName(), COUNT_STAR.class.getName() = = = src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java The trick of turning COUNTs into SUMs of COUNTS is already implicitly e= ncoded in COUNT's implementation of Algebraic. Could we use the algebraic i= nterface to do this generically, rather than hardcoding individual function= s in the compiler? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java why 0? "input" might be projecting something else, right? Can you reuse= "input"? src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java @Override src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java drop "=3D=3D true", it's already a boolean. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java as discussed, getting the total number of records is problematic. = What if we write a sampler that does something along the lines of openi= ng all input splits and reading from each until 2M are reached or input is = exhausted? = This doesn't let us use the 100K for 2m-2b heuristic, but it's somethin= g. = this is also where we can look at LoadFunc implementations and see if w= e can just get an estimated total # of records from it -- if we can, we don= 't have to do the above compromise, and can apply all the heuristics descri= bed by Arnab. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java go ahead and write a test. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java don't compare booleans to true. = = just do = if (className.equals(...)) { = } src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java same src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler= .java How come? That sounds a little scary. Let's understand this. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceO= per.java can these be added to POCube and extracted from within the contained pl= ans instead? - Dmitriy Ryaboy On Aug. 16, 2012, 10:19 p.m., Prasanth_J wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/6651/ > ----------------------------------------------------------- > = > (Updated Aug. 16, 2012, 10:19 p.m.) > = > = > Review request for pig and Dmitriy Ryaboy. > = > = > Description > ------- > = > This is a review board request for https://issues.apache.org/jira/browse/= PIG-2831 > = > = > This addresses bug PIG-2831. > https://issues.apache.org/jira/browse/PIG-2831 > = > = > Diffs > ----- > = > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobCon= trolCompiler.java 8029dec = > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRComp= iler.java 1d05a20 = > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapRed= uceOper.java b87c209 = > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/= MRPrinter.java 157caad = > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/express= ionOperators/POCast.java cde340c = > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/P= hyPlanVisitor.java ff65146 = > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relatio= nalOperators/POCube.java PRE-CREATION = > src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java = 0502917 = > src/org/apache/pig/builtin/CubeDimensions.java 5652029 = > src/org/apache/pig/builtin/PigStorage.java 21e835f = > src/org/apache/pig/builtin/RollupDimensions.java f6c26e4 = > src/org/apache/pig/impl/builtin/HolisticCube.java PRE-CREATION = > src/org/apache/pig/impl/builtin/HolisticCubeCompoundKey.java PRE-CREATI= ON = > src/org/apache/pig/impl/builtin/PartitionMaxGroup.java PRE-CREATION = > src/org/apache/pig/impl/builtin/PostProcessCube.java PRE-CREATION = > src/org/apache/pig/impl/io/ReadSingleLoader.java PRE-CREATION = > src/org/apache/pig/impl/util/Utils.java 270cb6a = > src/org/apache/pig/newplan/logical/optimizer/LogicalPlanPrinter.java 13= 439c6 = > src/org/apache/pig/newplan/logical/relational/LOCube.java b262efb = > src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisito= r.java 127ab7a = > src/org/apache/pig/newplan/logical/rules/ColumnPruneHelper.java 369f5c2 = > src/org/apache/pig/parser/LogicalPlanBuilder.java 289a76f = > src/org/apache/pig/pen/EquivalenceClasses.java 194f8cb = > src/org/apache/pig/pen/LineageTrimmingVisitor.java 917073c = > src/org/apache/pig/pen/util/DisplayExamples.java 265f8f7 = > test/org/apache/pig/impl/builtin/TestHolisticCubeCompundKey.java PRE-CR= EATION = > test/org/apache/pig/impl/builtin/TestPartitionMaxGroup.java PRE-CREATIO= N = > test/org/apache/pig/impl/builtin/TestPostProcessCube.java PRE-CREATION = > test/org/apache/pig/test/TestCubeOperator.java 65d56a6 = > = > Diff: https://reviews.apache.org/r/6651/diff/ > = > = > Testing > ------- > = > Unit tests: All passed > = > Pre-commit tests: All passed > ant clean test-commit > = > = > Thanks, > = > Prasanth_J > = > --===============3579780589398361828==--